Adaptive Reinforcement for Open-ended Medical Reasoning via Semantic-Guided Reward Collapse Mitigation
This paper introduces ARMed, a novel reinforcement learning framework that mitigates reward collapse through adaptive semantic rewards and chain-of-thought supervision to significantly enhance open-ended medical reasoning in vision-language models.