MDP Planning as Policy Inference
This paper reframes episodic Markov decision process planning as Bayesian inference over policies, introducing a variational sequential Monte Carlo method to approximate the posterior distribution of optimal behaviors and enable stochastic control through posterior predictive sampling rather than entropy regularization.