Joint MDPs and Reinforcement Learning in Coupled-Dynamics Environments
This paper introduces Joint MDPs (JMDPs), a formalism that augments standard MDPs with a multi-action sample transition model to specify the joint distribution of counterfactual one-step outcomes, enabling the derivation of Bellman operators and convergent dynamic programming algorithms for environments with coupled dynamics.