KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
The paper introduces KramaBench, a comprehensive benchmark featuring 104 real-world data-to-insight challenges across diverse domains, which reveals that current AI systems struggle to orchestrate end-to-end data pipelines over data lakes, achieving a maximum of only 55% accuracy despite strong performance in isolated tasks.
Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Om Chabra, Sivaprasad Sudhir, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Michael J. Cafarella, Lei Cao, Samuel Madden, Tim Kraska2026-03-09🤖 cs.AI