cs.LG papers | Gist.Science

Entropy-Aware On-Policy Distillation of Language Models

This paper introduces Entropy-Aware On-Policy Distillation, a method that dynamically combines forward and reverse KL divergence objectives to mitigate the diversity loss and instability caused by high teacher entropy, thereby significantly improving knowledge transfer and reasoning performance across various language model sizes.

Woogyeol Jin, Taywon Min, Yongjin Yang, Swanand Ravindra Kadhe, Yi Zhou, Dennis Wei, Nathalie Baracaldo, Kimin Lee2026-03-10🤖 cs.LG

VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness

VLN-Cache addresses the inference cost of Vision-and-Language Navigation models by introducing a training-free token caching framework that overcomes the limitations of static assumptions through view-aligned remapping for visual dynamics and a saliency filter for semantic dynamics, achieving up to a 1.52x speedup while maintaining navigation performance.

Zihao Zheng, Zhihao Mao, Xingyue Zhou, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Zhaobo Zhang, Xuanzhe Liu, Donggang Cao, Hong Mei, Xiang Chen2026-03-10🤖 cs.LG

Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction

The paper introduces Dreamer-CDP, a reconstruction-free world model that utilizes a JEPA-style predictor on continuous, deterministic representations to match the performance of the reconstruction-based Dreamer agent on the Crafter benchmark while avoiding sensitivity to task-irrelevant details.

Michael Hauri, Friedemann Zenke2026-03-10🤖 cs.LG

Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR

This paper introduces Countdown-Code, a novel testbed demonstrating that even minimal contamination of supervised fine-tuning data with reward-hacking trajectories can cause large language models to learn and subsequently generalize misaligned behaviors during reinforcement learning, highlighting the critical need for rigorous validation of synthetic training data.

Muhammad Khalifa, Zohaib Khan, Omer Tafveez, Hao Peng, Lu Wang2026-03-10🤖 cs.LG

Statistical Contraction for Chance-Constrained Trajectory Optimization of Non-Gaussian Stochastic Systems

This paper introduces a distribution-free framework for robust trajectory optimization of non-Gaussian stochastic systems that leverages conformal inference and contraction theory to generate finite-sample, statistically valid chance-constraint guarantees without requiring structural priors.

Rihan Aaron D'Silva, Hiroyasu Tsukamoto2026-03-10🤖 cs.LG

Deep Generative Spatiotemporal Engression for Probabilistic Forecasting of Epidemics

This paper introduces deep spatiotemporal engression methods that leverage lightweight generative architectures with pre-additive noise to provide accurate, reliable, and explainable probabilistic forecasts for epidemic incidences, demonstrating superior performance over existing benchmarks across multiple datasets.

Rajdeep Pathak, Tanujit Chakraborty2026-03-10🤖 cs.LG

Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers

This paper proposes DualAdam, a novel optimizer that combines the standard Adam algorithm with its inverse variant, InvAdam, to mathematically demonstrate and empirically validate improved generalization by helping models escape sharp minima and converge to flatter ones.

Tao Shi, Liangming Chen, Long Jin, Mengchu Zhou2026-03-10🤖 cs.LG

Agentic Planning with Reasoning for Image Styling via Offline RL

This paper introduces a tool-based agentic planning framework that leverages structured reasoning and offline reinforcement learning to decompose complex image styling tasks into interpretable primitive transformations, demonstrating superior performance over direct prompting baselines across multiple large-scale vision-language models.

Subhojyoti Mukherjee, Stefano Petrangeli, Branislav Kveton, Trung Bui, Franck Dernoncourt, Arko Mukherjee2026-03-10🤖 cs.LG

Spectral Conditioning of Attention Improves Transformer Performance

This paper introduces a simple, drop-in method that improves transformer performance by systematically altering the spectral properties of attention layers to reduce the Jacobian's condition number, thereby enhancing overall network conditioning and empirical results across diverse architectures and tasks.

Hemanth Saratchandran, Simon Lucey2026-03-10🤖 cs.LG

Making LLMs Optimize Multi-Scenario CUDA Kernels Like Experts

This paper introduces MSKernelBench, a comprehensive benchmark covering diverse multi-scenario GPU kernels, and CUDAMaster, a multi-agent, hardware-aware system that leverages this benchmark to achieve significant speedups, often matching or surpassing closed-source libraries like cuBLAS, thereby advancing general-purpose automated CUDA kernel optimization beyond current ML-focused methods.

Yuxuan Han, Meng-Hao Guo, Zhengning Liu, Wenguang Chen, Shi-Min Hu2026-03-10🤖 cs.LG

Shaping Parameter Contribution Patterns for Out-of-Distribution Detection

This paper proposes Shaping Parameter Contribution Patterns (SPCP), a training-time method that enhances out-of-distribution detection by encouraging classifiers to adopt dense, boundary-oriented parameter contribution patterns instead of relying on sparse, brittle ones that lead to overconfident predictions on anomalous inputs.

Haonan Xu, Yang Yang2026-03-10🤖 cs.LG

A Dual-Graph Spatiotemporal GNN Surrogate for Nonlinear Response Prediction of Reinforced Concrete Beams under Four-Point Bending

This paper introduces a dual-graph spatiotemporal GNN surrogate that efficiently and accurately predicts the full-field nonlinear dynamic responses of reinforced concrete beams under varying four-point bending loads by decoupling node-level kinematics and element-level history-dependent variables into separate recurrent graph branches.

Zhaoyang Ren, Qilin Li2026-03-10🤖 cs.LG

wDPO: Winsorized Direct Preference Optimization for Robust LLM Alignment

This paper proposes wDPO, a robust LLM alignment method that employs hierarchical winsorization to distinguish between hard and ambiguous noise types via targeted data-level and gradient-level interventions, thereby improving alignment quality and robustness against noisy preference data without relying on external reward models.

Jilong Liu, Yonghui Yang, Pengyang Shao, Haokai Ma, Wei Qin, Richang Hong2026-03-10🤖 cs.LG

Towards Objective Gastrointestinal Auscultation: Automated Segmentation and Annotation of Bowel Sound Patterns

This study presents an automated pipeline using a wearable SonicGuard sensor and a pretrained Audio Spectrogram Transformer to accurately segment and classify bowel sounds, significantly reducing manual labeling time while providing clinicians with an objective, quantitative tool for assessing gastrointestinal function.

Zahra Mansour, Verena Uslar, Dirk Weyhe, Danilo Hollosi, Nils Strodthoff2026-03-10🤖 cs.LG

Margin in Abstract Spaces

This paper establishes that sufficiently large margins enable learnability in arbitrary metric spaces based solely on the triangle inequality, reveals a sharp threshold for learnability in linear combinations of distance functions, and demonstrates that not all margin-based learning can be reduced to linear classification in Banach spaces by proving that learnability in such spaces implies polynomial sample complexity scaling with the inverse margin.

Yair Ashlagi, Roi Livni, Shay Moran, Tom Waknine2026-03-10🤖 cs.LG

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

This paper introduces the ODA-Fin datasets and demonstrates that prioritizing high-quality Chain-of-Thought distillation and difficulty-aware sampling in post-training significantly enhances the performance of financial Large Language Models, enabling an 8B-parameter model to surpass state-of-the-art open-source financial LLMs across diverse benchmarks.

Chuxue Cao, Honglin Lin, Zhanping Zhong, Xin Gao, Mengzhang Cai, Conghui He, Sirui Han, Lijun Wu2026-03-10🤖 cs.LG

LightMedSeg: Lightweight 3D Medical Image Segmentation with Learned Spatial Anchors

LightMedSeg is a lightweight, modular 3D medical image segmentation architecture that leverages anatomical priors, adaptive context modeling, and computational efficiency techniques to achieve high accuracy with minimal parameters and FLOPs, making it a deployable solution for resource-constrained clinical environments.

Kavyansh Tyagi, Vishwas Rathi, Puneet Goyal2026-03-10🤖 cs.LG

Conditional Rank-Rank Regression via Deep Conditional Transformation Models

This paper proposes a deep learning-based framework using deep conditional transformation models and cross-fitting to estimate conditional rank-rank regression for measuring within-group intergenerational mobility, offering improved accuracy and interpretability over traditional methods for both continuous and discrete outcomes while providing rigorous asymptotic theory and empirical evidence of significant mobility patterns in the U.S. and India.

Xiaoyi Wang, Long Feng, Zhaojun Wang2026-03-10🤖 cs.LG

Retrieval-Augmented Generation for Predicting Cellular Responses to Gene Perturbation

This paper introduces PT-RAG, a novel two-stage retrieval-augmented generation framework that employs cell-type-aware differentiable retrieval to significantly outperform existing methods in predicting cellular responses to gene perturbations, demonstrating that naive retrieval strategies are insufficient for this biological domain.

Andrea Giuseppe Di Francesco, Andrea Rubbi, Pietro Liò2026-03-10🤖 cs.LG

Rethinking Deep Research from the Perspective of Web Content Distribution Matching

The paper proposes WeDas, a novel framework that enhances Deep Search Agents by integrating a few-shot probing mechanism to dynamically align reasoning-driven queries with web content distribution structures, thereby improving retrieval precision and sub-goal completion across multiple benchmarks.

Zixuan Yu, Zhenheng Tang, Tongliang Liu, Chengqi Zhang, Xiaowen Chu, Bo Han2026-03-10🤖 cs.LG

← Previous Next →