-IPOMDP: Mitigating Deception in a Cognitive Hierarchy with Off-Policy Counterfactual Anomaly Detection
The paper introduces -IPOMDP, a computational framework that equips model-based reinforcement learning agents with anomaly detection and out-of-belief policies to identify and deter deception from more sophisticated opponents, thereby mitigating exploitation in mixed-motive and zero-sum games.