cs.LG papers | Gist.Science

Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach

This paper proposes a data-driven robust Markov decision process framework for Borel spaces with unknown disturbance distributions, utilizing ambiguity sets defined by distance functions to establish finite-sample performance guarantees, probabilistic convergence rates, and out-of-distribution bounds that empirical MDPs fail to provide.

Sivaramakrishnan Ramani2026-03-11🤖 cs.LG

MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment

MAPLE introduces a unified training paradigm that enhances medical large language models by integrating Test-Time Reinforcement Learning with expert-aligned Med-RPMs to replace unreliable majority voting with fine-grained process rewards, thereby significantly improving clinical reasoning accuracy and reliability across multiple benchmarks.

Kailong Fan, Anqi Pu, Yichen Wu, Wanhua Li, Yicong Li, Hanspeter Pfister, Huafeng Liu, Xiang Li, Quanzheng Li, Ning Guo2026-03-11🤖 cs.LG

Statistical Inference via Generative Models: Flow Matching and Causal Inference

This book reinterprets generative AI, specifically through flow matching, as a statistical framework for nonparametric distribution learning that enables principled inference for tasks like missing-data imputation and causal analysis by integrating generative models with double/debiased machine learning techniques to ensure inferential validity.

Shinto Eguchi2026-03-11🤖 cs.LG

The Coupling Within: Flow Matching via Distilled Normalizing Flows

This paper introduces Normalized Flow Matching (NFM), a novel method that distills quasi-deterministic couplings from pretrained auto-regressive normalizing flow models to train student flow models, achieving superior performance over both traditional flow matching approaches and the teacher models themselves.

David Berthelot, Tianrong Chen, Jiatao Gu, Marco Cuturi, Laurent Dinh, Bhavik Chandna, Michal Klein, Josh Susskind, Shuangfei Zhai2026-03-11🤖 cs.LG

An accurate flatness measure to estimate the generalization performance of CNN models

This paper proposes an exact, parameterization-aware flatness measure tailored to the geometric structure of convolutional neural networks with global average pooling, demonstrating its effectiveness as a robust proxy for estimating and comparing generalization performance across various CNN architectures.

Rahman Taleghani, Maryam Mohammadi, Francesco Marchetti2026-03-11🤖 cs.LG

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

This paper introduces CALIPER, a data-only, detector-agnostic test that determines the sufficient post-drift data size for stable model retraining by analyzing the trend of a one-step proxy error against a locality parameter, thereby bridging the gap between drift detection and effective adaptation in streaming learning.

Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai2026-03-11🤖 cs.LG

Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning

The paper introduces EPIC, a hardware- and physics-co-guided distributed scientific machine learning framework that significantly reduces communication latency and energy consumption while preserving physical fidelity by performing lightweight local encoding and physics-aware decoding with cross-attention for tasks like full-waveform inversion.

Yuchen Yuan, Junhuan Yang, Hao Wan, Yipei Liu, Hanhan Wu, Youzuo Lin, Lei Yang2026-03-11🤖 cs.LG

SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding

SCALAR is a bidirectional framework that couples LLM-guided symbolic planning with deep RL to iteratively refine skill specifications through execution feedback, significantly outperforming prior methods in complex environments like Craftax by correcting initial planning errors and improving sample efficiency.

Renos Zabounidis, Yue Wu, Simon Stepputtis, Woojun Kim, Yuanzhi Li, Tom Mitchell, Katia Sycara2026-03-11🤖 cs.LG

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

This paper presents FlexServe, a high-performance and secure LLM serving system for mobile devices that leverages a novel Flexible Resource Isolation mechanism to overcome the significant overhead of ARM TrustZone, achieving up to 10.05× faster time-to-first-token and 24.30× faster multi-model workflow execution compared to baseline designs.

Yinpeng Wu, Yitong Chen, Lixiang Wang, Jinyu Gu, Zhichao Hua, Yubin Xia2026-03-11🤖 cs.LG

From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring

The paper introduces Sentinel, an autonomous AI agent that achieves reliable, scalable clinical triage for remote patient monitoring by outperforming individual clinicians in sensitivity and consistency while maintaining a clinically defensible overtriage profile at a negligible cost.

Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbation

The paper proposes Sim2Act, a robust simulation-to-decision framework that enhances policy reliability in mission-critical domains by combining an adversarial calibration mechanism to align simulation fidelity with decision impact and a group-relative perturbation strategy to stabilize learning without overly conservative constraints.

Hongyu Cao, Jinghan Zhang, Kunpeng Liu, Dongjie Wang, Feng Xia, Haifeng Chen, Xiaohua Hu, Yanjie Fu2026-03-11🤖 cs.AI

Quality over Quantity: Demonstration Curation via Influence Functions for Data-Centric Robot Learning

This paper introduces Quality over Quantity (QoQ), a systematic framework that leverages influence functions to automatically curate high-quality robot learning demonstrations by quantifying each sample's contribution to reducing validation loss, thereby significantly improving policy performance over manual or heuristic data selection methods.

Haeone Lee, Taywon Min, Junsu Kim, Sinjae Kang, Fangchen Liu, Lerrel Pinto, Kimin Lee2026-03-11🤖 cs.LG

Adaptive Active Learning for Online Reliability Prediction of Satellite Electronics

This paper proposes a novel integrated online reliability prediction framework for satellite electronics that combines a Wiener process-based degradation model with a two-stage adaptive active learning strategy to significantly improve prediction accuracy while reducing data requirements under limited and variable operational conditions.

Shixiang Li, Yubin Tian, Dianpeng Wang, Piao Chen, Mengying Ren2026-03-11🤖 cs.LG

Dynamic Multi-period Experts for Online Time Series Forecasting

This paper introduces DynaME, a novel hybrid framework for online time series forecasting that redefines concept drift into recurring and emergent types, utilizing specialized historical experts for the former and a stable general expert for the latter to significantly outperform existing baselines.

Seungha Hong, Sukang Chae, Suyeon Kim, Sanghwan Jang, Hwanjo Yu2026-03-11🤖 cs.LG

Learning Adaptive LLM Decoding

This paper proposes learning lightweight, reinforcement-trained decoding adapters that dynamically select sampling strategies at both the sequence and token levels based on prompt features and compute budgets, significantly improving the accuracy-efficiency tradeoff on math and coding benchmarks compared to fixed hyperparameter baselines.

Chloe H. Su, Zhe Ye, Samuel Tenka, Aidan Yang, Soonho Kong, Udaya Ghai2026-03-11🤖 cs.LG

Verifying Good Regulator Conditions for Hypergraph Observers: Natural Gradient Learning from Causal Invariance via Established Theorems

This paper verifies that persistent observers in causally invariant hypergraph substrates satisfy the Conant-Ashby Good Regulator Theorem, thereby necessitating internal models that lead to natural gradient descent as the unique learning rule and yielding a model-dependent closed-form formula for Vanchurin's regime parameter $\alpha$ with a quantum-classical threshold at $\kappa(F)=2$ .

Max Zhuravlev2026-03-11🤖 cs.LG

Exclusive Self Attention

The paper introduces Exclusive Self Attention (XSA), a modification that constrains attention to information orthogonal to a token's own value vector, thereby improving Transformer performance in language modeling tasks, particularly as sequence length increases.

Shuangfei Zhai2026-03-11🤖 cs.LG

PPO-Based Hybrid Optimization for RIS-Assisted Semantic Vehicular Edge Computing

This paper proposes a Reconfigurable Intelligent Surface (RIS)-aided semantic-aware Vehicle Edge Computing framework that utilizes a Proximal Policy Optimization (PPO) and Linear Programming (LP) hybrid scheme to jointly optimize offloading ratios, semantic symbols, and RIS phase shifts, achieving a 40–50% reduction in end-to-end latency compared to existing methods.

Wei Feng, Jingbo Zhang, Qiong Wu, Pingyi Fan, Qiang Fan2026-03-11🤖 cs.LG

Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

This study demonstrates that integrating sentiment scores derived from a finetuned Qwen3 model analyzing English and Chinese news significantly enhances aluminum price forecasting accuracy and economic utility, particularly during periods of high market volatility, compared to traditional tabular data models.

Alvaro Paredes Amorin, Andre Python, Christoph Weisser2026-03-11🤖 cs.AI

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

This paper proposes a unified taxonomy and evaluation framework for latent world models in automated driving, organizing design choices by latent representations and structural priors while identifying key internal mechanics and research directions to enhance robustness, generalization, and deployability.

Rongxiang Zeng, Yongqi Dong2026-03-11🤖 cs.AI

← Previous Next →