Entropy-Aware On-Policy Distillation of Language Models

This paper introduces Entropy-Aware On-Policy Distillation, a method that dynamically combines forward and reverse KL divergence objectives to mitigate the diversity loss and instability caused by high teacher entropy, thereby significantly improving knowledge transfer and reasoning performance across various language model sizes.

Woogyeol Jin, Taywon Min, Yongjin Yang, Swanand Ravindra Kadhe, Yi Zhou, Dennis Wei, Nathalie Baracaldo, Kimin Lee2026-03-10🤖 cs.LG

VLN-Cache: Enabling Token Caching for VLN Models with Visual/Semantic Dynamics Awareness

VLN-Cache addresses the inference cost of Vision-and-Language Navigation models by introducing a training-free token caching framework that overcomes the limitations of static assumptions through view-aligned remapping for visual dynamics and a saliency filter for semantic dynamics, achieving up to a 1.52x speedup while maintaining navigation performance.

Zihao Zheng, Zhihao Mao, Xingyue Zhou, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Zhaobo Zhang, Xuanzhe Liu, Donggang Cao, Hong Mei, Xiang Chen2026-03-10🤖 cs.LG

Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR

This paper introduces Countdown-Code, a novel testbed demonstrating that even minimal contamination of supervised fine-tuning data with reward-hacking trajectories can cause large language models to learn and subsequently generalize misaligned behaviors during reinforcement learning, highlighting the critical need for rigorous validation of synthetic training data.

Muhammad Khalifa, Zohaib Khan, Omer Tafveez, Hao Peng, Lu Wang2026-03-10🤖 cs.LG

Agentic Planning with Reasoning for Image Styling via Offline RL

This paper introduces a tool-based agentic planning framework that leverages structured reasoning and offline reinforcement learning to decompose complex image styling tasks into interpretable primitive transformations, demonstrating superior performance over direct prompting baselines across multiple large-scale vision-language models.

Subhojyoti Mukherjee, Stefano Petrangeli, Branislav Kveton, Trung Bui, Franck Dernoncourt, Arko Mukherjee2026-03-10🤖 cs.LG

Making LLMs Optimize Multi-Scenario CUDA Kernels Like Experts

This paper introduces MSKernelBench, a comprehensive benchmark covering diverse multi-scenario GPU kernels, and CUDAMaster, a multi-agent, hardware-aware system that leverages this benchmark to achieve significant speedups, often matching or surpassing closed-source libraries like cuBLAS, thereby advancing general-purpose automated CUDA kernel optimization beyond current ML-focused methods.

Yuxuan Han, Meng-Hao Guo, Zhengning Liu, Wenguang Chen, Shi-Min Hu2026-03-10🤖 cs.LG

A Dual-Graph Spatiotemporal GNN Surrogate for Nonlinear Response Prediction of Reinforced Concrete Beams under Four-Point Bending

This paper introduces a dual-graph spatiotemporal GNN surrogate that efficiently and accurately predicts the full-field nonlinear dynamic responses of reinforced concrete beams under varying four-point bending loads by decoupling node-level kinematics and element-level history-dependent variables into separate recurrent graph branches.

Zhaoyang Ren, Qilin Li2026-03-10🤖 cs.LG

Towards Objective Gastrointestinal Auscultation: Automated Segmentation and Annotation of Bowel Sound Patterns

This study presents an automated pipeline using a wearable SonicGuard sensor and a pretrained Audio Spectrogram Transformer to accurately segment and classify bowel sounds, significantly reducing manual labeling time while providing clinicians with an objective, quantitative tool for assessing gastrointestinal function.

Zahra Mansour, Verena Uslar, Dirk Weyhe, Danilo Hollosi, Nils Strodthoff2026-03-10🤖 cs.LG

Margin in Abstract Spaces

This paper establishes that sufficiently large margins enable learnability in arbitrary metric spaces based solely on the triangle inequality, reveals a sharp threshold for learnability in linear combinations of distance functions, and demonstrates that not all margin-based learning can be reduced to linear classification in Banach spaces by proving that learnability in such spaces implies polynomial sample complexity scaling with the inverse margin.

Yair Ashlagi, Roi Livni, Shay Moran, Tom Waknine2026-03-10🤖 cs.LG

Unlocking Data Value in Finance: A Study on Distillation and Difficulty-Aware Training

This paper introduces the ODA-Fin datasets and demonstrates that prioritizing high-quality Chain-of-Thought distillation and difficulty-aware sampling in post-training significantly enhances the performance of financial Large Language Models, enabling an 8B-parameter model to surpass state-of-the-art open-source financial LLMs across diverse benchmarks.

Chuxue Cao, Honglin Lin, Zhanping Zhong, Xin Gao, Mengzhang Cai, Conghui He, Sirui Han, Lijun Wu2026-03-10🤖 cs.LG

Conditional Rank-Rank Regression via Deep Conditional Transformation Models

This paper proposes a deep learning-based framework using deep conditional transformation models and cross-fitting to estimate conditional rank-rank regression for measuring within-group intergenerational mobility, offering improved accuracy and interpretability over traditional methods for both continuous and discrete outcomes while providing rigorous asymptotic theory and empirical evidence of significant mobility patterns in the U.S. and India.

Xiaoyi Wang, Long Feng, Zhaojun Wang2026-03-10🤖 cs.LG