cs.LG papers | Gist.Science

A Detection-Gated Pipeline for Robust Glottal Area Waveform Extraction and Clinical Pathology Assessment

This paper presents a computationally efficient, detection-gated deep learning pipeline that achieves state-of-the-art robustness and cross-dataset generalization in glottal segmentation from high-speed videoendoscopy, enabling reliable extraction of clinical biomarkers for distinguishing healthy from pathological vocal function.

Harikrishnan Unnikrishnan2026-03-10🤖 cs.LG

Leveraging Model Soups to Classify Intangible Cultural Heritage Images from the Mekong Delta

This paper proposes a robust framework combining the hybrid CoAtNet architecture with model soups ensembling to effectively classify Intangible Cultural Heritage images from the Mekong Delta, achieving state-of-the-art performance on the ICH-17 dataset by reducing variance and enhancing generalization in data-scarce, high-similarity settings.

Quoc-Khang Tran, Minh-Thien Nguyen, Nguyen-Khang Pham2026-03-10🤖 cs.LG

Personalized Multi-Agent Average Reward TD-Learning via Joint Linear Approximation

This paper proposes and analyzes a personalized multi-agent average reward TD-learning algorithm that leverages joint linear approximation to learn shared subspaces and local heads, demonstrating convergence with linear speedup despite the challenges of environmental heterogeneity and Markovian sampling.

Leo Muxing Wang, Pengkun Yang, Lili Su2026-03-10🤖 cs.LG

Embedding interpretable $\ell_1$ -regression into neural networks for uncovering temporal structure in cell imaging

This paper proposes a hybrid neural network architecture that embeds an interpretable, $\ell_1$ -regularized vector autoregressive model within a convolutional autoencoder to effectively extract and visualize sparse temporal dynamics from two-photon calcium imaging data while preserving non-sparse spatial information.

Fabian Kabus, Maren Hackenberg, Julia Hindel, Thibault Cholvin, Antje Kilias, Thomas Brox, Abhinav Valada, Marlene Bartos, Harald Binder2026-03-10🤖 cs.LG

Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers

This paper introduces GramCol and a motion-feature selection algorithm to generate Interpretable Motion-Attentive Maps (IMAPs) that effectively localize both motion and non-motion concepts in Video Diffusion Transformers without requiring gradient calculations or parameter updates.

Youngjun Jun, Seil Kang, Woojung Han, Seong Jae Hwang2026-03-10🤖 cs.LG

CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning

This paper introduces CGL, a continual GUI learning framework that mitigates catastrophic forgetting by dynamically balancing Supervised Fine-Tuning and Reinforcement Learning through an entropy-guided proportion adjustment mechanism and a specialized gradient surgery strategy, validated by a new AndroidControl-CL benchmark.

Zhenquan Yao, Zitong Huang, Yihan Zeng, Jianhua Han, Hang Xu, Chun-Mei Feng, Jianwei Ma, Wangmeng Zuo2026-03-10🤖 cs.LG

Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails

This paper provides the first theoretical proof that Adam's second-moment normalization yields significantly sharper high-probability convergence guarantees ( $\delta^{-1/2}$ dependence) compared to SGD ( $\delta^{-1}$ dependence) under the classical bounded variance model, thereby explaining its empirical superiority.

Ruinan Jin, Yingbin Liang, Shaofeng Zou2026-03-10🤖 cs.LG

Information Routing in Atomistic Foundation Models: How Task Alignment and Equivariance Shape Linear Disentanglement

This paper introduces Compositional Probe Decomposition (CPD) to demonstrate that linear disentanglement of geometric and compositional information in atomistic foundation models is primarily driven by task alignment rather than architecture, revealing a significant performance gradient where models trained on specific properties like HOMO-LUMO gaps outperform energy-trained models and exhibit symmetry-dependent information routing.

Joshua Steier2026-03-10🤖 cs.LG

XInsight: Integrative Stage-Consistent Psychological Counseling Support Agents for Digital Well-Being

This paper introduces XInsight, a multi-agent framework that aligns psychological support with the Exploration-Insight-Action paradigm through a structured Reason-Intervene-Reflect cycle to enhance interpretability and therapeutic effectiveness in digital well-being applications, accompanied by the XInsight-Bench evaluation protocol.

Fei Wang, Jiangnan Yang, Junjie Chen, Yuxin Liu, Kun Li, Yanyan Wei, Dan Guo, Meng Wang2026-03-10🤖 cs.LG

vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM

This paper introduces vLLM Hook, an open-source plug-in that enables the programmable access and manipulation of internal model states within the vLLM inference engine to support advanced test-time alignment techniques such as adversarial prompt detection, enhanced RAG, and activation steering.

Ching-Yun Ko, Pin-Yu Chen2026-03-10🤖 cs.LG

Isotonic Layer: A Universal Framework for Generic Recommendation Debiasing

This paper introduces the Isotonic Layer, a novel differentiable framework that integrates piecewise linear fitting and learnable embeddings into neural architectures to enforce global monotonicity, thereby enabling granular, context-aware debiasing and improved calibration for large-scale recommendation systems.

Hailing Cheng, Yafang Yang, Hemeng Tao, Fengyu Zhang2026-03-10🤖 cs.LG

How Attention Sinks Emerge in Large Language Models: An Interpretability Perspective

This paper identifies a simple, semantics-free "P0 Sink Circuit" that emerges early in training to explain how Large Language Models develop attention sinks on the first token, suggesting this mechanism could serve as a signal for tracking pre-training convergence.

Runyu Peng, Ruixiao Li, Mingshu Chen, Yunhua Zhou, Qipeng Guo, Xipeng Qiu2026-03-10🤖 cs.LG

Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale

This paper demonstrates that hierarchical structures within the data generation process, modeled via probabilistic context-free grammars, serve as a unifying explanation for the emergence of diverse mechanistic phenomena like induction heads, function vectors, and the Hydra effect in Transformer-based language models.

Jonas Rohweder, Subhabrata Dutta, Iryna Gurevych2026-03-10🤖 cs.LG

Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation

This paper introduces Hierarchical Embedding Fusion (HEF), a two-stage framework that compresses repository code into a reusable hierarchy of dense vectors and maps them to learned pseudo-tokens, enabling low-latency, repository-aware code generation with accuracy comparable to traditional retrieval methods while significantly reducing inference costs.

Nikita Sorokin, Ivan Sedykh, Valentin Malykh2026-03-10🤖 cs.LG

FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

The paper introduces FuzzingRL, a framework that combines vision-language fuzzing with adversarial reinforcement fine-tuning to automatically generate diverse, challenging queries that systematically expose and degrade the performance of Vision Language Models.

Jiajun Xu, Jiageng Mao, Ang Qi, Weiduo Yuan, Alexander Romanus, Helen Xia, Vitor Campagnolo Guizilini, Yue Wang2026-03-10🤖 cs.LG

Switchable Activation Networks

This paper introduces Switchable Activation Networks (SWAN), a framework that equips neural units with input-dependent binary gates to dynamically allocate computation and learn structured activation patterns, thereby unifying sparsity, pruning, and adaptive inference to achieve efficient, accurate, and context-aware deep learning models.

Laha Ale, Ning Zhang, Scott A. King, Pingzhi Fan2026-03-10🤖 cs.LG

Khatri-Rao Clustering for Data Summarization

This paper introduces the Khatri-Rao clustering paradigm, which extends traditional centroid-based methods like k-Means and deep clustering by modeling centroids as interactions of multiple succinct protocentroids, thereby achieving more compact and accurate data summaries with reduced redundancy.

Martino Ciaperoni, Collin Leiber, Aristides Gionis, Heikki Mannila2026-03-10🤖 cs.LG

Scale Dependent Data Duplication

This paper demonstrates that data duplication is scale-dependent, revealing that as model capability and corpus size increase, semantically equivalent documents behave like exact duplicates by producing aligned gradients and causing accelerated semantic collisions, which leads to rapidly increasing training losses for larger models and necessitates new scaling laws to accurately predict performance.

Joshua Kazdan, Noam Levi, Rylan Schaeffer, Jessica Chudnovsky, Abhay Puri, Bo He, Mehmet Donmez, Sanmi Koyejo, David Donoho2026-03-10🤖 cs.LG

Know When You're Wrong: Aligning Confidence with Correctness for LLM Error Detection

This paper introduces a normalized confidence scoring framework based on output anchor tokens to detect LLM errors without external validation, revealing that while supervised fine-tuning yields well-calibrated confidence, reinforcement learning methods induce overconfidence, and proposing post-RL self-distillation to restore reliability for applications like adaptive retrieval-augmented generation.

Xie Xiaohu, Liu Xiaohu, Yao Benjamin2026-03-10🤖 cs.LG

Structure-Aware Set Transformers: Temporal and Variable-Type Attention Biases for Asynchronous Clinical Time Series

The paper introduces Structure-Aware Set Transformers (STAR), a novel architecture that enhances asynchronous clinical time series modeling by integrating parameter-efficient soft attention biases for temporal locality and variable-type affinity, thereby outperforming existing grid-based and set-based baselines on ICU prediction tasks while providing interpretable insights into temporal and variable interactions.

Joohyung Lee, Kwanhyung Lee, Changhun Kim, Eunho Yang2026-03-10🤖 cs.LG

← Previous Next →

cs.LG