cs.LG papers | Gist.Science

MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark

This paper introduces MMTU, a large-scale benchmark comprising over 28,000 questions across 25 real-world expert-level table tasks, designed to comprehensively evaluate and reveal the significant limitations of current frontier models in understanding, reasoning, and manipulating structured tabular data.

Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Lingjiao Chen, Dongmei Zhang, Surajit Chaudhuri, H. V. Jagadish2026-03-10🤖 cs.LG

Leveraging chaotic transients in the training of artificial neural networks

This paper demonstrates that utilizing unconventionally large learning rates to induce transient chaotic dynamics during neural network training creates an optimal balance between exploration and exploitation, thereby accelerating convergence to high accuracy across various architectures and tasks.

Pedro Jiménez-González, Miguel C. Soriano, Lucas Lacasa2026-03-10🤖 cs.LG

EROICA: Online Performance Troubleshooting for Large-scale Model Training

This paper presents EROICA, the first online troubleshooting system deployed on production-scale GPU clusters (~100,000 GPUs) that effectively diagnoses complex hardware and software performance issues in large-scale model training through fine-grained profiling and differential observability with minimal impact.

Yu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Pengcheng Zhang, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai2026-03-10🤖 cs.LG

BemaGANv2: Discriminator Combination Strategies for GAN-based Vocoders in Long-Term Audio Generation

BemaGANv2 is an advanced GAN-based vocoder that enhances long-term audio generation for Text-to-Music and Text-to-Audio applications by integrating Anti-aliased Multi-Periodicity composition modules in the generator and systematically evaluating novel discriminator combination strategies, including the Multi-Envelope Discriminator, to achieve high-fidelity and temporally coherent results.

Taesoo Park, Mungwi Jeong, Mingyu Park, Narae Kim, Junyoung Kim, Mujung Kim, Jisang Yoo, Hoyun Lee, Sanghoon Kim, Soonchul Kwon2026-03-10🤖 cs.LG

Co-LoRA: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients

This paper introduces Co-LoRA, a collaborative personalization framework that addresses both data and model heterogeneity through a task-relevance-aware aggregation strategy and a dimension-invariant module, validated by a new multi-modal benchmark and superior performance over state-of-the-art methods.

Minhyuk Seo, Taeheon Kim, Hankook Lee, Jonghyun Choi, Tinne Tuytelaars2026-03-10🤖 cs.LG

Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback

This paper introduces two efficient algorithms, Slate-GLM-OFU and Slate-GLM-TS, for the Logistic Contextual Slate Bandit problem that achieve $\tilde{O}(\sqrt{T})$ regret and $N^{O(1)}$ per-round computational complexity by combining local planning with global learning, demonstrating superior performance in both synthetic benchmarks and practical language model applications.

Tanmay Goyal, Gaurav Sinha2026-03-10🤖 cs.LG

Sharpness-Aware Machine Unlearning

This paper characterizes how Sharpness-Aware Minimization (SAM) alters generalization during machine unlearning by abandoning its denoising properties when fitting forget signals, leading to the proposal of "Sharp MinMax"—a novel method that splits the model to simultaneously learn retain signals via SAM and unlearn forget signals via sharpness maximization, thereby achieving superior unlearning performance, reduced feature entanglement, and enhanced privacy.

Haoran Tang, Rajiv Khanna2026-03-10🤖 cs.LG

Kolmogorov-Arnold Energy Models: Fast, Interpretable Generative Modeling

The paper introduces the Kolmogorov-Arnold Energy Model (KAEM), a generative framework that leverages the Kolmogorov-Arnold Representation Theorem to impose a univariate latent structure, thereby achieving a unique balance of fast, exact inference, high interpretability, and competitive sample quality compared to traditional VAEs and diffusion models.

Prithvi Raj2026-03-10🤖 cs.LG

From Semantic To Instance: A Semi-Self-Supervised Learning Approach

This paper proposes a semi-self-supervised learning approach featuring a novel GLMask representation and a semantic-to-instance pipeline that achieves state-of-the-art instance segmentation performance with minimal manual annotation, demonstrating superior results on both dense agricultural wheat head images and the general-purpose COCO dataset.

Keyhan Najafian, Farhad Maleki, Lingling Jin, Ian Stavness2026-03-10🤖 cs.LG

Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization

This paper introduces SamS, an efficient algorithm that adaptively schedules training samples in Direct Preference Optimization based on the model's evolving batch-wise states, significantly improving LLM alignment performance without modifying the core DPO algorithm or incurring substantial computational overhead.

Zixuan Huang, Yikun Ban, Lean Fu, Xiaojie Li, Zhongxiang Dai, Jianxin Li, Deqing Wang2026-03-10🤖 cs.LG

DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy

DemoDiffusion is a one-shot imitation learning method that enables robots to perform diverse manipulation tasks by leveraging kinematic retargeting to derive a rough trajectory from a single human demonstration and refining it with a pre-trained diffusion policy to ensure alignment with plausible robot actions, achieving significantly higher success rates than baseline approaches without requiring task-specific training or paired data.

Sungjae Park, Homanga Bharadhwaj, Shubham Tulsiani2026-03-10🤖 cs.LG

Adopting a human developmental visual diet yields robust, shape-based AI vision

By implementing a novel "developmental visual diet" inspired by human visual maturation, this study demonstrates that guiding AI learning processes rather than simply scaling data yields models with superior shape-based recognition, robustness to distortions, and alignment with human vision.

Zejin Lu, Sushrut Thorat, Radoslaw M Cichy, Tim C Kietzmann2026-03-10🤖 cs.LG

Noisy PDE Training Requires Bigger PINNs

This paper establishes that Physics-Informed Neural Networks (PINNs) require a network size scaling with the number of noisy samples to achieve empirical risk below the noise variance, demonstrating that simply increasing data quantity cannot compensate for insufficient model capacity in noisy PDE training.

Sebastien Andre-Sloan, Anirbit Mukherjee, Matthew Colbrook2026-03-10🤖 cs.LG

Towards Practical Benchmarking of Data Cleaning Techniques: On Generating Authentic Errors via Large Language Models

This paper introduces TableEG, a framework that leverages fine-tuned large language models to generate authentic, distribution-aligned synthetic errors in tabular data, thereby addressing the scarcity of real-world error datasets and establishing a robust benchmark for evaluating data cleaning techniques.

Xinyuan Liu, Jiahui Chen, Bocheng Hu, Yu Sun, Xinyang Chen, Shaoxu Song, Yongxin Tong2026-03-10🤖 cs.LG

A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition

This paper proposes MCULoRA, a novel parameter-efficient framework featuring modality combination aware low-rank adaptation and dynamic parameter fine-tuning to resolve gradient conflicts and improve performance in incomplete multimodal emotion recognition.

Xinkui Zhao, Jinsong Shu, Yangyang Wu, Guanjie Cheng, Zihe Liu, Naibo Wang, Shuiguang Deng, Zhongle Xie, Jianwei Yin2026-03-10💻 cs

Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

This paper identifies a pervasive "agreement bias" in Multimodal LLM verifiers that causes them to over-validate agent behavior, and proposes a lightweight Self-Grounded Verification (SGV) method that significantly improves failure detection and task completion across web navigation, computer use, and robotics by decoupling prior generation from trajectory evaluation.

Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt Kira2026-03-10🤖 cs.LG

Flow Matching Meets Biology and Life Science: A Survey

This paper presents the first comprehensive survey of flow matching applications in biology and life sciences, systematically reviewing its theoretical foundations and categorizing its recent advancements in biological sequence modeling, molecule design, and protein generation.

Zihao Li, Zhichen Zeng, Xiao Lin, Feihao Fang, Yanru Qu, Zhe Xu, Zhining Liu, Xuying Ning, Tianxin Wei, Ge Liu, Hanghang Tong, Jingrui He2026-03-10🤖 cs.LG

Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models

This paper proposes a tree-based Weak-to-Strong Generalization framework that leverages Monte Carlo Tree Search to organize both successful and failure trajectories from weak models, thereby significantly enhancing the reasoning and decision-making capabilities of strong models in complex interactive environments.

Ruimeng Ye, Zihan Wang, Yang Xiao, Zinan Ling, Manling Li, Bo Hui2026-03-10🤖 cs.LG

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

This paper investigates how malicious auditees can construct fairness-compliant yet representative-looking samples from non-compliant distributions to deceive auditors, formalizes these manipulation strategies using optimal transport and entropic projections, and proposes statistical tests to detect such distributional manipulation attacks.

Valentin Lafargue, Adriana Laurindo Monteiro, Emmanuelle Claeys, Laurent Risser, Jean-Michel Loubes2026-03-10🤖 cs.LG

Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models

This paper introduces a Dynamic, Automatic, and Systematic (DAS) red-teaming framework that exposes a critical "Benchmarking Gap" in medical large language models, revealing that despite high static benchmark scores, most models exhibit profound brittleness, privacy leaks, bias, and hallucinations when subjected to continuous, adversarial stress-testing.

Jiazhen Pan (Cherise), Bailiang Jian (Cherise), Paul Hager (Cherise), Yundi Zhang (Cherise), Che Liu (Cherise), Friedrike Jungmann (Cherise), Hongwei Bran Li (Cherise), Chenyu You (Cherise), Junde Wu (Cherise), Jiayuan Zhu (Cherise), Fenglin Liu (Cherise), Yuyuan Liu (Cherise), Niklas Bubeck (Cherise), Christian Wachinger (Cherise), Chen (Cherise), Chen (Cherise), Zhenyu Gong, Cheng Ouyang, Georgios Kaissis, Benedikt Wiestler, Daniel Rueckert2026-03-10🤖 cs.LG

← Previous Next →