cs.LG papers | Gist.Science

Sparsity and Out-of-Distribution Generalization

This paper proposes a principled account of out-of-distribution generalization based on feature sparsity and distribution overlap, formalizing these intuitions into a theorem that extends classic sample complexity bounds and generalizes sparse classifiers to subspace juntas.

Scott Aaronson, Lin Lin Lee, Jiawei Li2026-03-10🤖 cs.LG

Feed m Birds with One Scone: Accelerating Multi-task Gradient Balancing via Bi-level Optimization

This paper introduces MARIGOLD, a unified bi-level optimization framework that leverages zeroth-order methods to efficiently solve multi-task learning problems by dynamically balancing task gradients without requiring access to all task gradients, thereby overcoming the computational inefficiency of existing MGDA-type approaches.

Xuxing Chen, Yun He, Jiayi Xu, Minhui Huang, Xiaoyi Liu, Boyang Liu, Fei Tian, Xiaohan Wei, Rong Jin, Sem Park, Bo Long, Xue Feng2026-03-10🤖 cs.LG

Deterministic Fuzzy Triage for Legal Compliance Classification and Evidence Retrieval

This paper proposes a deterministic, seed-stable fuzzy triage system using a fine-tuned RoBERTa dual encoder to classify legal compliance and retrieve evidence with high accuracy and transparent error constraints, offering a reproducible middle ground between opaque large language models and rigid hand-crafted rules.

Rian Atri2026-03-10🤖 cs.LG

Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss

This paper introduces Decoupled Expected Quadratic Loss (DEQL) to generalize the EDLAE model, deriving efficient closed-form solutions for the previously unexplored $b > 0$ hyperparameter range that empirically outperform the original $b = 0$ baseline on benchmark datasets.

Ruixin Guo, Xinyu Li, Hao Zhou, Yang Zhou, Ruoming Jin2026-03-10🤖 cs.LG

Context Channel Capacity: An Information-Theoretic Framework for Understanding Catastrophic Forgetting

This paper introduces the information-theoretic concept of Context Channel Capacity ( $C_\mathrm{ctx}$ ) to explain catastrophic forgetting in continual learning, proving that zero forgetting requires $C_\mathrm{ctx} \geq H(T)$ and demonstrating that architectures with structural context pathways (like HyperNetworks) bypass the Impossibility Triangle to achieve near-perfect retention, whereas methods lacking such capacity inevitably suffer significant forgetting.

Ran Cheng2026-03-10🤖 cs.LG

DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation

DualSpec is a heterogeneous speculation framework that accelerates deep research agents by distinguishing between high-uncertainty Search and low-uncertainty Visit actions to enable a lightweight, confidence-based verification process, achieving up to 3.28 $\times$ speedup without compromising accuracy.

Shuzhang Zhong, Baotong Lu, Qi Chen, Chuanjie Liu, Fan Yang, Meng Li2026-03-10🤖 cs.LG

OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions

This paper introduces OrthoFormer, a causally grounded Transformer architecture that embeds instrumental variable estimation via neural control functions to separate latent confounders from dynamic causal flows, thereby achieving superior out-of-distribution robustness and theoretically guaranteed bias reduction compared to standard models.

Charles Luo2026-03-10🤖 cs.LG

Generalization in Online Reinforcement Learning for Mobile Agents

This paper addresses the underexplored challenge of generalization in online reinforcement learning for mobile GUI agents by introducing the AndroidWorld-Generalization benchmark and a scalable GRPO-based training system, demonstrating that while RL significantly improves zero-shot performance on unseen task instances, generalization to new templates and applications remains difficult and benefits from test-time few-shot adaptation.

Li Gu, Zihuan Jiang, Zhixiang Chi, Huan Liu, Ziqiang Wang, Yuanhao Yu, Glen Berseth, Yang Wang2026-03-10🤖 cs.LG

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

The paper proposes Data Agent, an end-to-end dynamic data selection framework that formulates sample selection as a training-aware sequential decision-making problem to accelerate model training while preserving performance across diverse datasets and tasks.

Suorong Yang, Fangjian Su, Hai Gan, Ziqi Ye, Jie Li, Baile Xu, Furao Shen, Soujanya Poria2026-03-10🤖 cs.LG

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

This paper establishes finite-sample guarantees for cost-driven state representation learning in infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control by analyzing two approaches—explicit latent modeling and implicit MuZero-like dynamics—while introducing a key technical proof of persistency of excitation for a novel stochastic process arising from quadratic regression.

Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra2026-03-10🤖 cs.LG

Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning

The paper proposes PACT, a fine-tuning framework that preserves LLM safety alignment by selectively constraining the model's confidence on a small subset of safety-related tokens during training, thereby preventing alignment drift without compromising downstream task performance.

Guoli Wang, Haonan Shi, Tu Ouyang, An Wang2026-03-10🤖 cs.LG

Discrete Tokenization Unlocks Transformers for Calibrated Tabular Forecasting

This paper demonstrates that a deliberately simplistic discrete tokenization strategy, combined with adaptive Gaussian smoothing, enables Transformers to outperform tuned gradient boosting models on large-scale tabular forecasting tasks while producing well-calibrated probability distributions.

Yael S. Elmatad2026-03-10🤖 cs.LG

Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System

This paper introduces Dial, a knowledge-grounded framework that addresses the challenges of generating executable SQL across heterogeneous database systems by employing dialect-aware logical planning, a hierarchical intent-aware knowledge base, and an execution-driven debugging loop, achieving significant improvements in translation accuracy and dialect feature coverage on the newly constructed DS-NL2SQL benchmark.

Xiang Zhang, Hongming Xu, Le Zhou, Wei Zhou, Xuanhe Zhou, Guoliang Li, Yuyu Luo, Changdong Liu, Guorun Chen, Jiang Liao, Fan Wu2026-03-10🤖 cs.LG

SLNet: A Super-Lightweight Geometry-Adaptive Network for 3D Point Cloud Recognition

The paper introduces SLNet, a super-lightweight 3D point cloud recognition network utilizing Nonparametric Adaptive Point Embedding (NAPE) and Geometric Modulation Units (GMU) to achieve state-of-the-art accuracy on benchmarks like ModelNet40 and ScanObjectNN with significantly fewer parameters and computational costs compared to existing models.

Mohammad Saeid, Amir Salarpour, Pedram MohajerAnsari, Mert D. Pesé2026-03-10🤖 cs.LG

The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling

This paper introduces the Dual-Stream Transformer, an architecture that decomposes the residual stream into separate token and context streams with tunable mixing strategies to achieve a balance between high interpretability and minimal performance loss while demonstrating robustness to attention amplification.

J. Clayton Kerce, Alexis Fox2026-03-10🤖 cs.LG

Trusting What You Cannot See: Auditable Fine-Tuning and Inference for Proprietary AI

The paper introduces AFTUNE, a framework that ensures the integrity of cloud-based large language model fine-tuning and inference by employing a lightweight recording and spot-check mechanism to generate verifiable execution traces, thereby enabling clients to practically audit proprietary AI processes without prohibitive overhead.

Heng Jin, Chaoyu Zhang, Hexuan Yu, Shanghao Shi, Ning Zhang, Y. Thomas Hou, Wenjing Lou2026-03-10🤖 cs.LG

Probabilistic Inference and Learning with Stein's Method

This monograph offers a rigorous theoretical and methodological overview of probabilistic inference and learning with Stein's method, detailing the construction and properties of Stein discrepancies, their connection to Stein variational gradient descent, and providing precise definitions and proofs.

Qiang Liu, Lester Mackey, Chris Oates2026-03-10🤖 cs.LG

Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments

This paper proposes a lightweight, self-supervised framework that augments a frozen speech enhancement backbone with low-rank adapters, enabling efficient on-device adaptation to dynamic real-world noise conditions by updating fewer than 1% of parameters while achieving significant signal quality improvements.

Longbiao Cheng, Shih-Chii Liu2026-03-10🤖 cs.LG

Contact-Guided 3D Genome Structure Generation of E. coli via Diffusion Transformers

This paper introduces a conditional diffusion-transformer framework that generates diverse ensembles of 3D *E. coli* genome conformations guided by Hi-C contact maps, effectively reconstructing heterogeneous structures whose ensemble averages align with experimental data while preserving conformational diversity.

Mingxin Zhang, Xiaofeng Dai, Yu Yao, Ziqi Yin2026-03-10🤖 cs.LG

Interpretable-by-Design Transformers via Architectural Stream Independence

This paper proposes and validates the Late Fusion Architecture (LFA), a transformer variant that enforces interpretability by design through architectural stream independence, which separates symbolic and contextual processing to prevent premature entanglement and significantly improve model stability and functional modularity compared to standard transformers.

Clayton Kerce, Alexis Fox2026-03-10🤖 cs.LG

← Previous Next →