Thickening-to-Thinning: Reward Shaping via Human-Inspired Learning Dynamics for LLM Reasoning
This paper introduces T2T (Thickening-to-Thinning), a dynamic reward shaping framework inspired by human learning dynamics that enhances LLM reasoning by encouraging longer, exploratory trajectories on incorrect attempts and penalizing length upon success, thereby outperforming standard baselines on mathematical benchmarks.