cs.LG papers | Gist.Science

CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning

This paper introduces CGL, a continual GUI learning framework that mitigates catastrophic forgetting by dynamically balancing Supervised Fine-Tuning and Reinforcement Learning through an entropy-guided proportion adjustment mechanism and a specialized gradient surgery strategy, validated by a new AndroidControl-CL benchmark.

Zhenquan Yao, Zitong Huang, Yihan Zeng, Jianhua Han, Hang Xu, Chun-Mei Feng, Jianwei Ma, Wangmeng Zuo2026-03-10🤖 cs.LG

Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails

This paper provides the first theoretical proof that Adam's second-moment normalization yields significantly sharper high-probability convergence guarantees ( $\delta^{-1/2}$ dependence) compared to SGD ( $\delta^{-1}$ dependence) under the classical bounded variance model, thereby explaining its empirical superiority.

Ruinan Jin, Yingbin Liang, Shaofeng Zou2026-03-10🤖 cs.LG

Information Routing in Atomistic Foundation Models: How Task Alignment and Equivariance Shape Linear Disentanglement

This paper introduces Compositional Probe Decomposition (CPD) to demonstrate that linear disentanglement of geometric and compositional information in atomistic foundation models is primarily driven by task alignment rather than architecture, revealing a significant performance gradient where models trained on specific properties like HOMO-LUMO gaps outperform energy-trained models and exhibit symmetry-dependent information routing.

Joshua Steier2026-03-10🤖 cs.LG

XInsight: Integrative Stage-Consistent Psychological Counseling Support Agents for Digital Well-Being

This paper introduces XInsight, a multi-agent framework that aligns psychological support with the Exploration-Insight-Action paradigm through a structured Reason-Intervene-Reflect cycle to enhance interpretability and therapeutic effectiveness in digital well-being applications, accompanied by the XInsight-Bench evaluation protocol.

Fei Wang, Jiangnan Yang, Junjie Chen, Yuxin Liu, Kun Li, Yanyan Wei, Dan Guo, Meng Wang2026-03-10🤖 cs.LG

vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

This paper introduces vLLM Hook, an open-source plug-in that enables the programmable access and manipulation of internal model states within the vLLM inference engine to support advanced test-time alignment techniques such as adversarial prompt detection, enhanced RAG, and activation steering.

Ching-Yun Ko, Pin-Yu Chen2026-03-10🤖 cs.LG

Isotonic Layer: A Universal Framework for Generic Recommendation Debiasing

This paper introduces the Isotonic Layer, a novel differentiable framework that integrates piecewise linear fitting and learnable embeddings into neural architectures to enforce global monotonicity, thereby enabling granular, context-aware debiasing and improved calibration for large-scale recommendation systems.

Hailing Cheng, Yafang Yang, Hemeng Tao, Fengyu Zhang2026-03-10🤖 cs.LG

How Attention Sinks Emerge in Large Language Models: An Interpretability Perspective

This paper identifies a simple, semantics-free "P0 Sink Circuit" that emerges early in training to explain how Large Language Models develop attention sinks on the first token, suggesting this mechanism could serve as a signal for tracking pre-training convergence.

Runyu Peng, Ruixiao Li, Mingshu Chen, Yunhua Zhou, Qipeng Guo, Xipeng Qiu2026-03-10🤖 cs.LG

Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale

This paper demonstrates that hierarchical structures within the data generation process, modeled via probabilistic context-free grammars, serve as a unifying explanation for the emergence of diverse mechanistic phenomena like induction heads, function vectors, and the Hydra effect in Transformer-based language models.

Jonas Rohweder, Subhabrata Dutta, Iryna Gurevych2026-03-10🤖 cs.LG

Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation

This paper introduces Hierarchical Embedding Fusion (HEF), a two-stage framework that compresses repository code into a reusable hierarchy of dense vectors and maps them to learned pseudo-tokens, enabling low-latency, repository-aware code generation with accuracy comparable to traditional retrieval methods while significantly reducing inference costs.

Nikita Sorokin, Ivan Sedykh, Valentin Malykh2026-03-10🤖 cs.LG

FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

The paper introduces FuzzingRL, a framework that combines vision-language fuzzing with adversarial reinforcement fine-tuning to automatically generate diverse, challenging queries that systematically expose and degrade the performance of Vision Language Models.

Jiajun Xu, Jiageng Mao, Ang Qi, Weiduo Yuan, Alexander Romanus, Helen Xia, Vitor Campagnolo Guizilini, Yue Wang2026-03-10🤖 cs.LG

Switchable Activation Networks

This paper introduces Switchable Activation Networks (SWAN), a framework that equips neural units with input-dependent binary gates to dynamically allocate computation and learn structured activation patterns, thereby unifying sparsity, pruning, and adaptive inference to achieve efficient, accurate, and context-aware deep learning models.

Laha Ale, Ning Zhang, Scott A. King, Pingzhi Fan2026-03-10🤖 cs.LG

Khatri-Rao Clustering for Data Summarization

This paper introduces the Khatri-Rao clustering paradigm, which extends traditional centroid-based methods like k-Means and deep clustering by modeling centroids as interactions of multiple succinct protocentroids, thereby achieving more compact and accurate data summaries with reduced redundancy.

Martino Ciaperoni, Collin Leiber, Aristides Gionis, Heikki Mannila2026-03-10🤖 cs.LG

Scale Dependent Data Duplication

This paper demonstrates that data duplication is scale-dependent, revealing that as model capability and corpus size increase, semantically equivalent documents behave like exact duplicates by producing aligned gradients and causing accelerated semantic collisions, which leads to rapidly increasing training losses for larger models and necessitates new scaling laws to accurately predict performance.

Joshua Kazdan, Noam Levi, Rylan Schaeffer, Jessica Chudnovsky, Abhay Puri, Bo He, Mehmet Donmez, Sanmi Koyejo, David Donoho2026-03-10🤖 cs.LG

Know When You're Wrong: Aligning Confidence with Correctness for LLM Error Detection

This paper introduces a normalized confidence scoring framework based on output anchor tokens to detect LLM errors without external validation, revealing that while supervised fine-tuning yields well-calibrated confidence, reinforcement learning methods induce overconfidence, and proposing post-RL self-distillation to restore reliability for applications like adaptive retrieval-augmented generation.

Xie Xiaohu, Liu Xiaohu, Yao Benjamin2026-03-10🤖 cs.LG

Structure-Aware Set Transformers: Temporal and Variable-Type Attention Biases for Asynchronous Clinical Time Series

The paper introduces Structure-Aware Set Transformers (STAR), a novel architecture that enhances asynchronous clinical time series modeling by integrating parameter-efficient soft attention biases for temporal locality and variable-type affinity, thereby outperforming existing grid-based and set-based baselines on ICU prediction tasks while providing interpretable insights into temporal and variable interactions.

Joohyung Lee, Kwanhyung Lee, Changhun Kim, Eunho Yang2026-03-10🤖 cs.LG

LegoNet: Memory Footprint Reduction Through Block Weight Clustering

LegoNet is a post-training compression technique that clusters 4x4 weight blocks across entire neural network architectures to achieve memory footprint reductions of up to 128x with negligible accuracy loss, without requiring any retraining or architectural modifications.

Joseph Bingham, Noah Green, Saman Zonouz2026-03-10🤖 cs.LG

Multi-Agent DRL for V2X Resource Allocation: Disentangling Challenges and Benchmarking Solutions

This paper addresses the lack of systematic evaluation in Multi-Agent Deep Reinforcement Learning for C-V2X resource allocation by introducing a disentangled benchmark suite of interference games and diverse datasets to isolate specific challenges, ultimately identifying policy robustness and generalization across vehicular topologies as the primary hurdle and demonstrating the superiority of actor-critic methods over value-based approaches.

Siyuan Wang, Lei Lei, Pranav Maheshwari, Sam Bellefeuille, Kan Zheng, Dusit Niyato2026-03-10🤖 cs.LG

Scaling Strategy, Not Compute: A Stand-Alone, Open-Source StarCraft II Benchmark for Accessible Reinforcement Learning Research

To address the complexity gap between StarCraft II's full game and its mini-games, this paper introduces the Two-Bridge Map Suite, an open-source, lightweight benchmark that isolates tactical navigation and combat skills to enable accessible reinforcement learning research under realistic compute budgets.

Sourav Panda, Shreyash Kale, Tanmay Ambadkar, Abhinav Verma, Jonathan Dodge2026-03-10🤖 cs.LG

Valid Feature-Level Inference for Tabular Foundation Models via the Conditional Randomization Test

This paper introduces a practical method for obtaining finite-sample valid p-values for feature-level hypothesis testing in tabular data by combining the Conditional Randomization Test with the TabPFN foundation model, enabling statistical inference without model retraining or parametric assumptions.

Mohamed Salem2026-03-10🤖 cs.LG

CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training

This paper introduces CapTrack, a capability-centric framework that redefines LLM forgetting as systematic behavioral drift rather than mere knowledge loss, revealing through a large-scale study that post-training significantly degrades robustness and default behaviors, with instruction fine-tuning causing the most pronounced effects.

Lukas Thede, Stefan Winzeck, Zeynep Akata, Jonathan Richard Schwarz2026-03-10🤖 cs.LG

← Previous Next →