Interactive Benchmarks
This paper proposes "Interactive Benchmarks," a unified evaluation paradigm that assesses model intelligence through active information acquisition and reasoning under budget constraints in interactive proofs and games, demonstrating that current models still have significant room for improvement in these dynamic scenarios.