ABD: Default Exception Abduction in Finite First Order Worlds
This paper introduces ABD, a benchmark for default-exception abduction in finite first-order worlds that evaluates ten frontier LLMs on their ability to generate sparse, satisfiability-restoring formulas across three observation regimes, revealing that while models achieve high validity, they struggle with parsimony and exhibit distinct generalization failures.