Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization
This paper introduces Fine-grained Group Policy Optimization (FGO), a reinforcement learning algorithm that effectively compresses verbose Chain-of-Thought reasoning in Large Language Models while simultaneously addressing the data inefficiency and entropy collapse limitations of Group Relative Policy Optimization (GRPO).