Enhancing OLAP Resilience at LinkedIn

This paper presents a holistic resiliency framework for Apache Pinot at LinkedIn, featuring Query Workload Isolation, Impact-Free Rebalancing, Maintenance Zone Awareness, and Adaptive Server Selection, which collectively ensure stable subsecond query latency and high availability for petabyte-scale OLAP workloads under failures and load spikes.

Praveen Chaganlal, Jia Guo, Vivek Vaidyanathan, Dino Occhialini, Sonam Mandal, Subbu Subramaniam, Siddharth Teotia, Tianqi Li, Xiaxuan Gao, Florence ZhangTue, 10 Ma💻 cs

LLM-FK: Multi-Agent LLM Reasoning for Foreign Key Detection in Large-Scale Complex Databases

LLM-FK is a novel multi-agent framework that overcomes the limitations of conventional heuristic and naive LLM methods in detecting foreign keys within large-scale complex databases by coordinating specialized agents to prune the search space, enhance reasoning with domain knowledge, and ensure global schema consistency, thereby achieving superior accuracy and scalability.

Zijian Tang, Ying Zhang, Sibo Cai, Ruoxuan WangTue, 10 Ma💻 cs

Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models

This paper introduces TableEG, a framework that leverages fine-tuned large language models to generate authentic, distribution-aligned synthetic errors in tabular data, thereby addressing the scarcity of real-world error datasets and establishing a robust benchmark for evaluating data cleaning techniques.

Xinyuan Liu, Jiahui Chen, Bocheng Hu, Yu Sun, Xinyang Chen, Shaoxu Song, Yongxin TongTue, 10 Ma🤖 cs.LG

MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark

This paper introduces MMTU, a large-scale benchmark comprising over 28,000 questions across 25 real-world expert-level table tasks, designed to comprehensively evaluate and reveal the significant limitations of current frontier models in understanding, reasoning, and manipulating structured tabular data.

Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Lingjiao Chen, Dongmei Zhang, Surajit Chaudhuri, H. V. JagadishTue, 10 Ma🤖 cs.LG

SDFed: Bridging Local Global Discrepancy via Subspace Refinement and Divergence Control in Federated Prompt Learning

SDFed is a heterogeneous federated prompt learning framework that addresses local-global discrepancies by combining a fixed-length global prompt with variable-length local prompts, enhanced by subspace refinement and divergence control strategies to improve performance and robustness in privacy-sensitive, resource-constrained multi-party settings.

Yicheng Di, Wei Yuan, Tieke He, Yuan Liu, Hongzhi YinTue, 10 Ma🤖 cs.LG

Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases

This paper introduces Rel-MOSS, a novel relation-centric deep learning framework that addresses the critical issue of class imbalance in relational databases by employing a relation-wise gating controller and a relation-guided minority synthesizer to enhance the representation and over-sampling of minority entities, thereby significantly outperforming existing methods in entity classification tasks.

Jun Yin, Peng Huo, Bangguo Zhu, Hao Yan, Senzhang Wang, Shirui Pan, Chengqi ZhangTue, 10 Ma🤖 cs.LG

Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System

This paper introduces Dial, a knowledge-grounded framework that addresses the challenges of generating executable SQL across heterogeneous database systems by employing dialect-aware logical planning, a hierarchical intent-aware knowledge base, and an execution-driven debugging loop, achieving significant improvements in translation accuracy and dialect feature coverage on the newly constructed DS-NL2SQL benchmark.

Xiang Zhang, Hongming Xu, Le Zhou, Wei Zhou, Xuanhe Zhou, Guoliang Li, Yuyu Luo, Changdong Liu, Guorun Chen, Jiang Liao, Fan WuTue, 10 Ma🤖 cs.LG

Approximate Nearest Neighbor Search for Modern AI: A Projection-Augmented Graph Approach

This paper introduces Projection-Augmented Graph (PAG), a novel Approximate Nearest Neighbor Search framework that integrates projection techniques into graph indexing to simultaneously achieve high query efficiency, fast indexing, low memory usage, and robust scalability across modern AI workloads, outperforming existing methods like HNSW by up to 5x in speed while supporting online insertions.

Kejing Lu, Zhenpeng Pan, Jianbin Qin, Yoshiharu Ishikawa, Chuan XiaoTue, 10 Ma🤖 cs.LG

A Hypergraph-Based Framework for Exploratory Business Intelligence

This paper introduces ExBI, a novel hypergraph-based framework for Exploratory Business Intelligence that overcomes traditional limitations of static schemas and high computational costs through dynamic schema evolution and sampling-based algorithms, achieving significant speedups over existing systems like Neo4j and MySQL while maintaining high analytical accuracy.

Yunkai Lou, Shunyang Li, Longbin Lai, Jianke Yu, Wenyuan Yu, Ying ZhangThu, 12 Ma💻 cs