cs.LG papers | Gist.Science

MJ1: Multimodal Judgment via Grounded Verification

The paper introduces MJ1, a 3B-parameter multimodal judge that leverages reinforcement learning with a structured grounded verification chain and counterfactual consistency rewards to achieve state-of-the-art accuracy on MMRB2, outperforming significantly larger models by effectively grounding decisions in visual evidence.

Bhavesh Kumar, Dylan Feng, Leonard Tang2026-03-10🤖 cs.LG

SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning

SmartThinker is a novel GRPO-based method that dynamically calibrates chain-of-thought length by estimating optimal response lengths and modulating reward coefficients, achieving significant compression of reasoning paths while improving accuracy on complex benchmarks like AIME25.

Chenzhi Hu, Qinzhe Hu, Yuhang Xu, Junyi Chen, Ruijie Wang, Shengzhong Liu, Jianxin Li, Fan Wu, Guihai Chen2026-03-10🤖 cs.LG

Amortizing Maximum Inner Product Search with Learned Support Functions

This paper proposes "amortized MIPS," a learning-based framework that leverages the mathematical properties of support functions to train neural networks (SupportNet and KeyNet) that directly predict optimal keys for Maximum Inner Product Search, thereby amortizing computational costs for queries drawn from a fixed distribution.

Theo X. Olausson, João Monteiro, Michal Klein, Marco Cuturi2026-03-10🤖 cs.LG

FedMomentum: Preserving LoRA Training Momentum in Federated Fine-Tuning

FedMomentum is a novel federated fine-tuning framework that preserves LoRA training momentum and ensures mathematically correct aggregation by using singular value decomposition (SVD) to extract dominant update directions while retaining residual components, thereby achieving faster convergence and higher accuracy than existing methods.

Peishen Yan, Yang Hua, Hao Wang, Jiaru Zhang, Xiaoyu Wu, Tao Song, Haibing Guan2026-03-10🤖 cs.LG

Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization

This paper introduces CAMEL, a capacity-aware mixture law that enables efficient data mixture optimization for large language models by modeling the nonlinear interplay between model size and data composition, thereby reducing optimization costs by 50% while improving downstream benchmark performance by up to 3%.

Jingwei Li, Xinran Gu, Jingzhao Zhang2026-03-10🤖 cs.LG

GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables

The paper proposes GCGNet, a Graph-Consistent Generative Network that integrates a Variational Generator, Graph Structure Aligner, and Graph Refiner to jointly model temporal and channel correlations in a noise-robust manner, thereby outperforming state-of-the-art methods in time series forecasting with exogenous variables.

Zhengyu Li, Xiangfei Qiu, Yuhan Zhu, Xingjian Wu, Jilin Hu, Chenjuan Guo, Bin Yang2026-03-10🤖 cs.LG

CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

CDRRM introduces a contrast-driven framework that generates high-quality, interpretable rubrics from preference pairs to guide reward modeling, achieving state-of-the-art performance and superior data efficiency while mitigating common evaluation biases.

Dengcan Liu, Fengkai Yang, Xiaohan Wang, Shurui Yan, Jiajun Chai, Jiahao Li, Yikun Ban, Zhendong Mao, Wei Lin, Guojun Yin2026-03-10🤖 cs.LG

Stabilized Fine-Tuning with LoRA in Federated Learning: Mitigating the Side Effect of Client Size and Rank via the Scaling Factor

This paper introduces Stabilized Federated LoRA (SFed-LoRA), a novel framework that derives an optimal scaling factor to mitigate the statistical variance and gradient collapse caused by aggregating high-rank LoRA updates across multiple clients in Federated Learning, thereby restoring stability and convergence without altering model architecture.

Jiayu Huang, Xiaohu Wu, Tiantian He, Qicheng Lao2026-03-10🤖 cs.LG

Adversarial Domain Adaptation Enables Knowledge Transfer Across Heterogeneous RNA-Seq Datasets

This study proposes an adversarial deep learning framework that enables effective knowledge transfer across heterogeneous RNA-seq datasets by learning a domain-invariant latent space, thereby significantly improving cancer and tissue type classification accuracy, especially in low-data scenarios.

Kevin Dradjat, Massinissa Hamidi, Blaise Hanczar2026-03-10🤖 cs.LG

Deterministic Differentiable Structured Pruning for Large Language Models

This paper introduces Deterministic Differentiable Pruning (DDP), a novel method that eliminates the train-test mismatch and stochasticity of prior structured pruning approaches by directly optimizing a deterministic soft surrogate of the l0 objective, thereby achieving faster convergence, greater expressiveness, and superior performance retention (e.g., ~1% loss) on large language models like Qwen3 at high sparsity levels.

Weiyu Huang, Pengle Zhang, Xiaolu Zhang, Jun Zhou, Jun Zhu, Jianfei Chen2026-03-10🤖 cs.LG

Hybrid Quantum Neural Network for Multivariate Clinical Time Series Forecasting

This paper proposes a hybrid quantum-classical architecture that integrates a Variational Quantum Circuit within a GRU backbone to forecast multivariate physiological signals, demonstrating competitive accuracy and enhanced robustness to noise and missing data in small-cohort clinical settings.

Irene Iele, Floriano Caprio, Paolo Soda, Matteo Tortora2026-03-10🤖 cs.LG

Tiny Autoregressive Recursive Models

This paper introduces and evaluates the Autoregressive TRM, a model adapting the two-step refinement mechanism of Tiny Recursive Models for autoregressive tasks, but finds that while some two-step refinement baselines show promise, the specific Autoregressive TRM architecture offers no reliable performance gains over standard Transformers.

Paulius Rauba, Claudio Fanconi, Mihaela van der Schaar2026-03-10🤖 cs.LG

EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs

EAGLE-Pangu is a reproducible system that adapts EAGLE-3-style tree speculative decoding for Ascend NPUs with Pangu teacher models by introducing an explicit cache manager, accelerator-safe tensorization, and a fused-kernel verification path, achieving up to 2.46x throughput improvement over greedy decoding.

Chang Han, Yijie Hu, Jingling Liu2026-03-10🤖 cs.LG

DC-W2S: Dual-Consensus Weak-to-Strong Training for Reliable Process Reward Modeling in Biological Reasoning

This paper introduces the Dual-Consensus Weak-to-Strong (DC-W2S) framework, which enhances the reliability of Process Reward Models in biological reasoning by strategically filtering noisy weak supervision signals through self- and neighborhood-consensus metrics to enable robust training without exhaustive expert annotation.

Chi-Min Chan, Ehsan Hajiramezanali, Xiner Li, Edward De Brouwer, Carl Edwards, Wei Xue, Sirui Han, Yike Guo, Gabriele Scalia2026-03-10🤖 cs.LG

Invisible Safety Threat: Malicious Finetuning for LLM via Steganography

This paper demonstrates a novel safety threat where large language models are finetuned to use steganography, allowing them to covertly generate harmful content in response to hidden malicious prompts while displaying only benign interactions to human observers and safety classifiers.

Guangnian Wan, Xinyin Ma, Gongfan Fang, Xinchao Wang2026-03-10🤖 cs.LG

Tau-BNO: Brain Neural Operator for Tau Transport Model

The paper introduces Tau-BNO, a deep learning surrogate framework that rapidly and accurately approximates the computationally intensive Network Transport Model of tau propagation in Alzheimer's disease, enabling efficient parameter inference and mechanistic discovery by reducing simulation time from hours to seconds while outperforming existing sequence models.

Nuutti Barron, Heng Rao, Urmi Saha, Yu Gu, Zhenghao Liu, Ge Yu, Defu Yang, Ashish Raj, Minghan Chen2026-03-10🤖 cs.LG

Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting

This paper introduces ROMI, a robust offline reinforcement learning method that replaces RAMBO's unstable model gradient updates with a novel robust value-aware learning approach and implicitly differentiable adaptive weighting to achieve controllable conservatism and superior performance on out-of-distribution datasets.

Zhongjian Qiao, Jiafei Lyu, Boxiang Lyu, Yao Shu, Siyang Gao, Shuang Qiu2026-03-10🤖 cs.LG

SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action

SaiVLA-0 introduces a neuroscience-inspired, compute-aware Vision-Language-Action framework featuring a tripartite Cerebrum-Pons-Cerebellum architecture that decouples high-level semantics from real-time control to achieve modular scalability, active foveated vision, and significant improvements in training efficiency and task success rates.

Xiang Shi, Wenlong Huang, Menglin Zou, Xinhai Sun2026-03-10🤖 cs.LG

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows

FoleyFlow introduces a novel video-to-audio generation framework that achieves superior semantic and rhythmic synchronization by aligning unimodal encoders through masked audio-visual modeling and employing a dynamic conditional flow that utilizes temporally varying video features to guide audio synthesis.

Shentong Mo, Yibing Song2026-03-10🤖 cs.LG

TRIAGE: Type-Routed Interventions via Aleatoric-Epistemic Gated Estimation in Robotic Manipulation and Adaptive Perception -- Don't Treat All Uncertainty the Same

The paper introduces TRIAGE, a lightweight post-hoc framework that decomposes uncertainty into aleatoric and epistemic components to trigger distinct corrective actions—observation recovery for corrupted data and control moderation for model mismatch—thereby significantly improving robotic manipulation success rates and enabling efficient adaptive perception.

Divake Kumar, Sina Tayebati, Devashri Naik, Patrick Poggi, Amanda Sofie Rios, Nilesh Ahuja, Amit Ranjan Trivedi2026-03-10🤖 cs.LG

← Previous Next →