cs.LG papers | Gist.Science

Not all tokens are needed(NAT): token efficient reinforcement learning

The paper introduces NAT (Not All Tokens Are Needed), a token-efficient reinforcement learning framework that utilizes unbiased partial-token gradient estimation via Horvitz-Thompson reweighting to achieve full-sequence performance with significantly reduced compute and memory costs by updating policies on only a subset of generated tokens.

Hejian Sang, Yuanda Xu, Zhengze Zhou, Ran He, Zhipeng Wang2026-03-10🤖 cs.LG

GraphSkill: Documentation-Guided Hierarchical Retrieval-Augmented Coding for Complex Graph Reasoning

GraphSkill is an agentic framework that improves complex graph reasoning by leveraging hierarchical document retrieval and self-debugging with generated test cases, validated on a new comprehensive dataset.

Fali Wang, Chenglin Weng, Xianren Zhang, Siyuan Hong, Hui Liu, Suhang Wang2026-03-10🤖 cs.LG

Reward Under Attack: Analyzing the Robustness and Hackability of Process Reward Models

This paper reveals that state-of-the-art Process Reward Models (PRMs) are systematically exploitable by adversarial optimization, functioning primarily as fluency detectors rather than reasoning verifiers due to a critical dissociation between stylistic changes and ground-truth accuracy, prompting the release of a diagnostic framework and benchmark to address these vulnerabilities.

Rishabh Tiwari, Aditya Tomar, Udbhav Bamba, Monishwaran Maheswaran, Heng Yang, Michael W. Mahoney, Kurt Keutzer, Amir Gholami2026-03-10🤖 cs.LG

From ARIMA to Attention: Power Load Forecasting Using Temporal Deep Learning

This paper empirically demonstrates that a Transformer model utilizing self-attention mechanisms outperforms traditional ARIMA and recurrent neural network approaches (LSTM, BiLSTM) in short-term power load forecasting on PJM data, achieving a superior 3.8% MAPE and highlighting the effectiveness of attention-based architectures for capturing complex temporal patterns.

Suhasnadh Reddy Veluru, Sai Teja Erukude, Viswa Chaitanya Marella2026-03-10🤖 cs.LG

Advances in GRPO for Generation Models: A Survey

This survey comprehensively reviews the methodological advances and diverse applications of Flow-GRPO, a framework that extends Group Relative Policy Optimization to align large-scale flow matching models with human preferences across various generative tasks and modalities.

Zexiang Liu, Xianglong He, Yangguang Li2026-03-10🤖 cs.LG

Exploration Space Theory: Formal Foundations for Prerequisite-Aware Location-Based Recommendation

This paper introduces Exploration Space Theory (EST), a formal lattice-theoretic framework that adapts Knowledge Space Theory to location-based recommendation by modeling prerequisite dependencies among points of interest, thereby providing structural guarantees for validity, optimality, and explainability in the Exploration Space Recommender System (ESRS).

Madjid Sadallah2026-03-10🤖 cs.LG

Pavement Missing Condition Data Imputation through Collective Learning-Based Graph Neural Networks

This paper proposes a collective learning-based Graph Convolutional Network model that effectively imputes missing pavement condition data by integrating features from adjacent road sections and capturing dependencies between observed conditions, demonstrating promising results in a Texas Department of Transportation case study.

Ke Yu, Lu Gao2026-03-10🤖 cs.LG

Grouter: Decoupling Routing from Representation for Accelerated MoE Training

Grouter is a preemptive routing framework that decouples structural optimization from weight updates by distilling high-quality routing policies from fully trained models, thereby significantly accelerating Mixture-of-Experts (MoE) training convergence and throughput while improving data utilization.

Yuqi Xu, Rizhen Hu, Zihan Liu, Mou Sun, Kun Yuan2026-03-10🤖 cs.LG

T-REX: Transformer-Based Category Sequence Generation for Grocery Basket Recommendation

The paper proposes T-REX, a novel transformer-based architecture that addresses the unique challenges of online grocery shopping by generating personalized category-level basket recommendations through dynamic sequence splitting, adaptive positional encoding, and causal masking to effectively capture both short-term dependencies and long-term user preferences.

Soroush Mokhtari, Muhammad Tayyab Asif, Sergiy Zubatiy2026-03-10🤖 cs.LG

Leakage Safe Graph Features for Interpretable Fraud Detection in Temporal Transaction Networks

This paper proposes a leakage-safe, time-respecting graph feature extraction protocol for temporal transaction networks that, when combined with transaction attributes, significantly enhances the interpretability and performance of illicit entity classification while preventing look-ahead bias.

Hamideh Khaleghpour, Brett McKinney2026-03-10🤖 cs.LG

A new Uncertainty Principle in Machine Learning

This paper proposes a new "Uncertainty Principle" in machine learning, asserting that the sharpness of a minimum in polynomial-based problems is inversely related to the smoothness of the optimization landscape, a phenomenon caused by the degeneracy of Heaviside and sigmoid expansions that traps gradient descent and necessitates a physics-based rather than purely computational approach to solving these scientific problems.

V. Dolotin, A. Morozov2026-03-10🤖 cs.LG

Graph Property Inference in Small Language Models: Effects of Representation and Inference Strategy

This paper demonstrates that the ability of small language models to infer graph properties depends critically on how relational information is represented and the reasoning strategy employed, rather than solely on model scale.

Michal Podstawski2026-03-10🤖 cs.LG

SmartBench: Evaluating LLMs in Smart Homes with Anomalous Device States and Behavioral Contexts

This paper introduces SmartBench, the first dataset designed to evaluate LLMs on detecting anomalous device states and behavioral contexts in smart homes, revealing that current state-of-the-art models struggle significantly with this critical task.

Qingsong Zou, Zhi Yan, Zhiyao Xu, Kuofeng Gao, Jingyu Xiao, Yong Jiang2026-03-10🤖 cs.LG

HEARTS: Benchmarking LLM Reasoning on Health Time Series

The paper introduces HEARTS, a comprehensive benchmark comprising 16 real-world health datasets and 110 tasks across four reasoning capabilities, which reveals that current large language models significantly underperform specialized models in health time series analysis due to struggles with multi-step temporal reasoning and reliance on simple heuristics.

Sirui Li, Shuhan Xiao, Mihir Joshi, Ahmed Metwally, Daniel McDuff, Wei Wang, Yuzhe Yang2026-03-10🤖 cs.LG

RECAP: Local Hebbian Prototype Learning as a Self-Organizing Readout for Reservoir Dynamics

RECAP is a bio-inspired image classification method that couples untrained reservoir dynamics with a self-organizing Hebbian prototype readout to achieve robust, backpropagation-free learning capable of generalizing to corrupted inputs without prior exposure.

Heng Zhang2026-03-10🤖 cs.LG

Roots Beneath the Cut: Uncovering the Risk of Concept Revival in Pruning-Based Unlearning for Diffusion Models

This paper reveals that pruning-based unlearning in diffusion models is inherently insecure because the locations of pruned weights act as side-channel signals that enable a novel, data-free, and training-free attack to fully revive erased concepts, prompting a call for safer pruning mechanisms that conceal these locations.

Ci Zhang, Zhaojun Ding, Chence Yang, Jun Liu, Xiaoming Zhai, Shaoyi Huang, Beiwen Li, Xiaolong Ma, Jin Lu, Geng Yuan2026-03-10🤖 cs.LG

SR-TTT: Surprisal-Aware Residual Test-Time Training

SR-TTT addresses the catastrophic recall failures of Test-Time Training (TTT) language models by introducing a loss-gated sparse memory mechanism that dynamically routes highly surprising tokens to an exact-attention residual cache, thereby preserving O(1) memory efficiency while enabling accurate retrieval of critical information.

Swamynathan V P2026-03-10🤖 cs.LG

Quantum Deep Learning: A Comprehensive Review

This comprehensive review defines Quantum Deep Learning (QDL) through a four-paradigm taxonomy, critically assesses its theoretical foundations and experimental implementations across various hardware systems, and outlines a verification-aware roadmap for transitioning from near-term demonstrations to scalable, fault-tolerant applications.

Yanjun Ji, Zhao-Yun Chen, Marco Roth, David A. Kreplin, Christian Schiffer, Martin King, Oliver Anton, M. Sahnawaz Alam, Markus Krutzik, Dennis Willsch, Ludwig Mathey, Frank K. Wilhelm, Guo-Ping Guo2026-03-10⚛️ quant-ph

Trust Aware Federated Learning for Secure Bone Healing Stage Interpretation in e-Health

This paper proposes a trust-aware federated learning framework that utilizes an Adaptive Trust Score Scaling and Filtering mechanism to secure bone healing stage interpretation in e-Health by mitigating the impact of unreliable or adversarial participants while maintaining model integrity and predictive performance.

Paul Shepherd, Tasos Dagiuklas, Bugra Alkan, Joaquim Bastos, Jonathan Rodriguez2026-03-10🤖 cs.LG

HURRI-GAN: A Novel Approach for Hurricane Bias-Correction Beyond Gauge Stations using Generative Adversarial Networks

The paper introduces HURRI-GAN, a novel TimeGAN-based framework that corrects systemic biases in high-resolution hurricane simulation models like ADCIRC, enabling accurate, near real-time storm surge forecasting and bias extrapolation beyond gauge station locations while significantly reducing computational runtime.

Noujoud Nadera, Hadi Majed, Stefanos Giaremis, Rola El Osta, Clint Dawson, Carola Kaiser, Hartmut Kaiser2026-03-10🤖 cs.LG

← Previous Next →