Agentified Assessment of Logical Reasoning Agents
This paper introduces an agentified assessment framework that utilizes an assessor agent to ensure reproducible and robust evaluation of logical reasoning systems, demonstrating its effectiveness by benchmarking an auto-formalization agent that achieves 86.70% accuracy on a solver-verified FOLIO dataset, significantly outperforming a chain-of-thought baseline.