EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

This paper introduces EvoSchema, a comprehensive benchmark featuring a novel taxonomy of ten schema perturbation types to evaluate and enhance the robustness of text-to-SQL models against real-world database schema evolution, revealing that table-level changes significantly impact performance and demonstrating that training on diverse schema designs improves model resilience.

Tianshu Zhang, Kun Qian, Siddhartha Sahai, Yuan Tian, Shaddy Garg, Huan Sun, Yunyao LiThu, 12 Ma💬 cs.CL

Effective Dataset Distillation for Spatio-Temporal Forecasting with Bi-dimensional Compression

The paper introduces STemDist, the first dataset distillation method designed for spatio-temporal forecasting that simultaneously compresses both spatial and temporal dimensions through a hybrid cluster-level and subset-based approach, achieving significantly faster training, reduced memory usage, and lower prediction errors compared to existing methods.

Taehyung Kwon, Yeonje Choi, Yeongho Kim, Kijung ShinThu, 12 Ma🤖 cs.LG

Trajectory-Informed Memory Generation for Self-Improving Agent Systems

This paper introduces a novel framework for self-improving LLM agents that automatically extracts structured learnings from execution trajectories—categorizing them into strategy, recovery, and optimization tips—and injects them via adaptive memory retrieval to significantly boost task completion rates, particularly on complex scenarios.

Gaodan Fang, Vatche Isahagian, K. R. Jayaram, Ritesh Kumar, Vinod Muthusamy, Punleuk Oum, Gegi ThomasThu, 12 Ma🤖 cs.AI

Publication and Maintenance of Relational Data in Enterprise Knowledge Graphs (Revised Version)

This paper proposes a formal framework, architecture, and algorithms for constructing and incrementally maintaining materialized RDB2RDF views to enable efficient, semantically integrated access to legacy relational data within Enterprise Knowledge Graphs.

Vânia Maria Ponte Vidal (Departamento de Computação, UFC, Fortaleza, Brazil), Valéria Magalhães Pequeno (TechLab, Departamento de Ciências e Tecnologias, UAL, Lisboa, Portugal), Marco Antonio Casanova (Instituto Tecgraf, Puc-Rio, Rio de Janeiro, Brazil), Narciso Arruda (Departamento de Computação, UFC, Fortaleza, Brazil), Carlos Brito (Departamento de Computação, UFC, Fortaleza, Brazil)Mon, 09 Ma💻 cs

Efficient Query Rewrite Rule Discovery via Standardized Enumeration and Learning-to-Rank(extend)

This paper presents SLER, a scalable system that combines standardized template enumeration with a learning-to-rank model to overcome the exponential search space and redundancy challenges of existing methods, successfully discovering over one million high-quality query rewrite rules for complex query plans.

Yuan Zhang, Yuxing Chen, Yuekun Yu, Jinbin Huang, Rui Mao, Anqun Pan, Lixiong Zheng, Jianbin QinMon, 09 Ma💻 cs

Tag-specific Regret Minimization Problem in Outdoor Advertising

This paper introduces the NP-hard Tag-specific Regret Minimization in Outdoor Advertising (TRMOA) problem, demonstrating that while standard greedy methods fail due to non-monotone and non-submodular regret models, fairness-aware greedy round-robin, randomized greedy, and local search algorithms effectively minimize regret and balance allocations using real-world datasets.

Dildar Ali, Abishek Salaria, Ansh Jasrotia, Suman BanerjeeMon, 09 Ma💻 cs

Efficient Vector Search in the Wild: One Model for Multi-K Queries

The paper introduces OMEGA, a K-generalizable learned top-K search method that leverages a base model trained on K=1 with trajectory-based features and a dynamic refinement procedure to achieve high accuracy and low latency for multi-K vector queries while significantly reducing preprocessing time compared to state-of-the-art methods.

Yifan Peng, Jiafei Fan, Xingda Wei, Sijie Shen, Rong Chen, Jianning Wang, Xiaojian Luo, Wenyuan Yu, Jingren Zhou, Haibo ChenMon, 09 Ma🤖 cs.LG

KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes

The paper introduces KramaBench, a comprehensive benchmark featuring 104 real-world data-to-insight challenges across diverse domains, which reveals that current AI systems struggle to orchestrate end-to-end data pipelines over data lakes, achieving a maximum of only 55% accuracy despite strong performance in isolated tasks.

Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Om Chabra, Sivaprasad Sudhir, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Michael J. Cafarella, Lei Cao, Samuel Madden, Tim KraskaMon, 09 Ma🤖 cs.AI

HCT-QA: A Benchmark for Question Answering on Human-Centric Tables

This paper introduces HCT-QA, a comprehensive benchmark comprising thousands of real-world and synthetic human-centric tables with natural language question-answer pairs, designed to evaluate and improve the performance of Large Language Models and Vision Language Models in querying complex tabular data.

Mohammad S. Ahmad, Zan A. Naeem, Michaël Aupetit, Ahmed Elmagarmid, Mohamed Eltabakh, Xiaosong Ma, Mourad Ouzzani, Chaoyi Ruan, Hani Al-SayehMon, 09 Ma🤖 cs.AI

Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities

This paper examines how the rapid advancement of AI, particularly with foundation models and unstructured data, introduces new challenges in latency, scalability, and interpretability for human-data interaction, arguing for a paradigm shift that redefines human-machine roles and integrates cognitive and perceptual principles to build more effective, human-centered analytical systems.

Jean-Daniel Fekete, Yifan Hu, Dominik Moritz, Arnab Nandi, Senjuti Basu Roy, Eugene Wu, Nikos Bikakis, George Papastefanatos, Panos K. Chrysanthis, Guoliang Li, Lingyun YuMon, 09 Ma🤖 cs.AI

Bala-Join: An Adaptive Hash Join for Balancing Communication and Computation in Geo-Distributed SQL Databases

The paper introduces Bala-Join, an adaptive hash join framework for geo-distributed SQL databases that utilizes a Balanced Partition and Partial Replication (BPPR) algorithm and an online skewed key detector to effectively balance communication and computation loads, thereby significantly improving throughput and mitigating performance degradation caused by data skew in Wide Area Network deployments.

Wenlong Song, Hui Li, Bingying Zhai + 5 more2026-03-06💻 cs