Sparse Offline Reinforcement Learning with Corruption Robustness
This paper proposes actor-critic methods with sparse robust estimator oracles to achieve the first non-vacuous guarantees for learning near-optimal policies in high-dimensional sparse offline reinforcement learning under strong data corruption and single-policy concentrability, overcoming the limitations of traditional Least Square Value Iteration approaches in such regimes.