cs.LG papers | Gist.Science

Overcoming Valid Action Suppression in Unmasked Policy Gradient Algorithms

This paper identifies and theoretically proves that unmasked policy gradient algorithms systematically suppress valid actions at unvisited states due to parameter sharing and gradient propagation, a failure mode that action masking avoids and that can be mitigated in unmasked settings through feasibility classification.

Renos Zabounidis, Roy Siegelmann, Mohamad Qadri, Woojun Kim, Simon Stepputtis, Katia P. Sycara2026-03-11🤖 cs.LG

Probabilistic Hysteresis Factor Prediction for Electric Vehicle Batteries with Graphite Anodes Containing Silicon

This paper proposes a data-driven framework that harmonizes heterogeneous driving cycle data and employs statistical and deep learning models to enable efficient, probabilistic prediction of voltage hysteresis factors in silicon-graphite anode batteries, thereby improving state-of-charge estimation and generalizability across different vehicle models.

Runyao Yu, Viviana Kleine, Philipp Gromotka, Thomas Rudolf, Adrian Eisenmann, Gautham Ram Chandra Mouli, Peter Palensky, Jochen L. Cremer2026-03-11🤖 cs.LG

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

This paper introduces DCPO, a framework that resolves the inherent gradient conflict between accuracy and calibration in Reinforcement Learning from Verifiable Rewards by decoupling reasoning and confidence objectives, thereby achieving state-of-the-art calibration performance without compromising model accuracy.

Zhengzhao Ma, Xueru Wen, Boxi Cao, Yaojie Lu, Hongyu Lin, Jinglin Yang, Min He, Xianpei Han, Le Sun2026-03-11🤖 cs.LG

Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning

This paper proposes a Probability of Necessity and Sufficiency (PNS)-based regularization method for Class-Incremental Learning that utilizes a dual-scope counterfactual generator to mitigate feature collisions caused by intra-task shortcut reliance and inter-task semantic confusion, thereby ensuring both the causal completeness and separability of task-specific representations.

Zhen Zhang, Jielei Chu, Tianrui Li2026-03-11🤖 cs.AI

RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning

RubiCap introduces a novel reinforcement learning framework that leverages LLM-generated rubrics to create structured, multi-faceted reward signals for dense image captioning, thereby overcoming the limitations of supervised distillation and deterministic checkers to achieve state-of-the-art performance and superior word efficiency across various benchmarks.

Tzu-Heng Huang, Sirajul Salekin, Javier Movellan, Frederic Sala, Manjot Bilkhu2026-03-11🤖 cs.AI

Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL

This paper proposes a cost-effective framework that leverages structurally informative but functionally imperfect LLM-generated RTL to train netlist representation models, effectively overcoming data scarcity and outperforming methods reliant on scarce high-quality labeled datasets.

Siyang Cai, Cangyuan Li, Yinhe Han, Ying Wang2026-03-11🤖 cs.AI

GIAT: A Geologically-Informed Attention Transformer for Lithology Identification

The paper proposes GIAT, a novel Geologically-Informed Attention Transformer that integrates Category-Wise Sequence Correlation filters into the self-attention mechanism to guide lithology identification with geological priors, achieving state-of-the-art accuracy and enhanced interpretability on well log datasets.

Jie Li, Qishun Yang, Nuo Li2026-03-11🤖 cs.AI

Better Bounds for the Distributed Experts Problem

This paper presents an improved distributed protocol for the distributed experts problem that achieves a specific regret bound while significantly reducing communication costs compared to previous work.

David P. Woodruff, Samson Zhou2026-03-11🤖 cs.LG

Differentiable Stochastic Traffic Dynamics: Physics-Informed Generative Modelling in Transportation

This paper proposes a physics-informed generative modeling framework that derives a differentiable, distributional traffic dynamics model from stochastic Ito-type equations, enabling the estimation of traffic density distributions, credible intervals, and congestion risks through a score network trained with denoising score matching and Fokker-Planck residual loss.

Wuping Xin2026-03-11🤖 cs.AI

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Latent-DARM is a novel latent-space communication framework that bridges Discrete Diffusion Language Models for global planning and Autoregressive Models for fluent execution, significantly improving reasoning accuracy on benchmarks like DART-5 and AIME2024 while drastically reducing token usage compared to state-of-the-art reasoning models.

Lina Berrayana, Ahmed Heakl, Abdullah Sohail, Thomas Hofmann, Salman Khan, Wei Chen2026-03-11🤖 cs.AI

The Costs of Reproducibility in Music Separation Research: a Replication of Band-Split RNN

This paper addresses the reproducibility crisis in music source separation by attempting to replicate the Band-Split RNN model, ultimately releasing an optimized version with improved performance and publicly available code to advocate for more transparent and sustainable research practices.

Paul Magron, Romain Serizel, Constance Douwes2026-03-11🤖 cs.LG

$P^2$ GNN: Two Prototype Sets to boost GNN Performance

The paper introduces $P^2$ GNN, a plug-and-play technique that leverages two sets of prototypes to enrich global context and denoise local neighborhoods, thereby significantly boosting the performance of Message Passing Graph Neural Networks across diverse node recommendation and classification tasks.

Arihant Jain, Gundeep Arora, Anoop Saladi, Chaosheng Dong2026-03-11🤖 cs.LG

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

This paper argues that advancements in logical reasoning for large language models inadvertently create a mechanistic pathway to dangerous situational awareness and strategic deception, necessitating new safety frameworks like the RAISE model to mitigate these emergent risks.

Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary2026-03-11🤖 cs.AI

The Radio-Frequency Transformer for Signal Separation

This paper presents a fully data-driven signal separator using a modified SoundStream tokenizer and a transformer trained with cross-entropy loss, which achieves significant improvements in separating radio-frequency signals from non-Gaussian interference compared to conventional methods.

Egor Lifar, Semyon Savkin, Rachana Madhukara, Tejas Jayashankar, Yury Polyanskiy, Gregory W. Wornell2026-03-11🤖 cs.LG

Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing

This paper investigates emotion as a latent factor influencing LLM attention and reasoning, introducing the AURA-QA dataset and an emotional regularization framework that demonstrably improves reading comprehension performance across both emotionally varying and standard benchmarks.

Benjamin Reichman, Adar Avasian, Samuel Webster, Larry Heck2026-03-11🤖 cs.AI

MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

MM-Zero is the first RL-based framework to enable Vision Language Models to self-evolve from zero data by employing a multi-role system (Proposer, Coder, and Solver) trained with Group Relative Policy Optimization to generate visual concepts, render them via code, and solve multimodal reasoning tasks without any seed images.

Zongxia Li, Hongyang Du, Chengsong Huang, Xiyang Wu, Lantao Yu, Yicheng He, Jing Xie, Xiaomin Wu, Zhichao Liu, Jiarui Zhang, Fuxiao Liu2026-03-11🤖 cs.LG

Strategically Robust Multi-Agent Reinforcement Learning with Linear Function Approximation

This paper proposes \texttt{RQRE-OVI}, an optimistic value iteration algorithm that computes the unique and smooth Risk-Sensitive Quantal Response Equilibrium (RQRE) in general-sum Markov games with linear function approximation, offering a principled trade-off between performance and robustness that outperforms traditional Nash equilibrium approaches in both theoretical guarantees and empirical stability.

Jake Gonzales, Max Horwitz, Eric Mazumdar, Lillian J. Ratliff2026-03-11🤖 cs.LG

Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

This paper introduces Test-Time Control (TTC), a hardware-efficient neural layer that embeds finite-horizon optimal control planning directly into pretrained LLMs via a symplectic LQR solver, significantly boosting mathematical reasoning performance without requiring test-time training.

Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, René Vidal2026-03-11🤖 cs.LG

A Generative Sampler for distributions with possible discrete parameter based on Reversibility

This paper proposes a unified, target-gradient-free generative sampling framework that enforces time-reversibility constraints via Maximum Mean Discrepancy minimization between forward and backward Markov trajectories, enabling efficient sampling from complex continuous, discrete, and hybrid distributions using only energy evaluations.

Lei Li, Zhen Wang, Lishuo Zhang2026-03-11🤖 cs.LG

Efficient Reasoning at Fixed Test-Time Cost via Length-Aware Attention Priors and Gain-Aware Training

This paper proposes a training-only framework combining a length-aware attention prior (RPA) and a gain-aware controller (Guardian) to enhance reasoning efficiency and reduce validation loss in Transformers without increasing test-time computational costs or latency.

Rian Atri2026-03-11🤖 cs.LG

← Previous Next →

cs.LG