How Well Do AI Systems Solve AP Physics? A Comparative Evaluation of Large Language Models on Algebra-Based Free Response Questions
This study evaluates four large language models on AP Physics free-response questions, finding that while they demonstrate strong capabilities in structured algebraic problem-solving, they exhibit significant limitations in spatial reasoning, visual interpretation, and conceptual integration, with performance varying notably across different years and exam levels.