Adaptive Social Learning via Mode Policy Optimization for Language Agents
This paper proposes the Adaptive Social Learning (ASL) framework, featuring the Adaptive Mode Policy Optimization (AMPO) algorithm, to enable language agents to dynamically switch between intuitive and deliberative reasoning modes based on context, thereby achieving superior task performance and token efficiency compared to existing methods like GPT-4o and GRPO.