cs.LG papers | Gist.Science

Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance

The paper presents "Guardian," an interpretable, three-layer decision-support system that combines Markov chains, reinforcement learning, and LLM-based validation to generate dynamic, probabilistic search plans for missing-child investigations within the critical first 72 hours.

Joshua Castillo, Ravi Mukkamala2026-03-11🤖 cs.AI

BiCLIP: Domain Canonicalization via Structured Geometric Transformation

The paper introduces BiCLIP, a simple and parameter-efficient framework that achieves state-of-the-art few-shot domain adaptation for vision-language models by applying a structured geometric transformation to align multimodal features across disparate domains using a small set of anchor samples.

Pranav Mantini, Shishir K. Shah2026-03-11🤖 cs.AI

Kernel Debiased Plug-in Estimation based on the Universal Least Favorable Submodel

This paper introduces ULFS-KDPE, a novel kernel-based estimator that achieves semiparametric efficiency for pathwise differentiable parameters in nonparametric models by constructing a data-adaptive debiasing flow via a universal least favorable submodel, thereby eliminating the need for explicit efficient influence function derivation while ensuring rigorous theoretical guarantees and computational tractability.

Haiyi Chen, Yang Liu, Ivana Malenica2026-03-11🤖 cs.LG

Towards Reliable Simulation-based Inference

This thesis addresses the problem of overconfident conclusions in simulation-based inference by introducing a "balancing" regularization technique and a novel Bayesian neural network prior to ensure more reliable and calibrated statistical approximations.

Arnaud Delaunoy2026-03-11🤖 cs.LG

A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations

This paper introduces Guardian, a consensus-driven, multi-LLM pipeline enhanced by QLoRA fine-tuning that coordinates specialized models and a consensus engine to perform auditable, structured information extraction for critical missing-person investigations while avoiding unconstrained decision-making.

Joshua Castillo, Ravi Mukkamala2026-03-11🤖 cs.AI

A Survey of Reinforcement Learning For Economics

This survey introduces reinforcement learning to economists as a flexible, sample-based extension of dynamic programming capable of solving high-dimensional economic models, while critically examining its practical limitations such as sample inefficiency, sensitivity to hyperparameters, and reliance on accurate simulators.

Pranjal Rawat2026-03-11🤖 cs.LG

The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference

This paper introduces the $qs$ inequality to demonstrate that Mixture-of-Experts (MoE) models suffer from a structural "double penalty" of routing fragmentation and memory constraints during inference, often rendering them significantly less efficient than quality-matched dense models for long-context serving despite their training-time FLOP advantages.

Vignesh Adhinarayanan, Nuwan Jayasena2026-03-11🤖 cs.LG

Semantic Level of Detail: Multi-Scale Knowledge Representation via Heat Kernel Diffusion on Hyperbolic Manifolds

This paper introduces Semantic Level of Detail (SLoD), a framework that utilizes heat kernel diffusion on hyperbolic manifolds to enable continuous, principled control over knowledge abstraction levels in AI memory systems, automatically detecting emergent semantic boundaries in both synthetic and real-world knowledge graphs without manual supervision.

Edward Izgorodin2026-03-11🤖 cs.AI

MAcPNN: Mutual Assisted Learning on Data Streams with Temporal Dependence

This paper proposes MAcPNN, a decentralized Mutual Assisted Learning paradigm inspired by Vygotsky's Sociocultural Theory that enables autonomous IoT devices to collaboratively address concept drifts and temporal dependence in data streams using Continuous Progressive Neural Networks while minimizing communication overhead compared to traditional Federated Learning.

Federico Giannini, Emanuele Della Valle2026-03-11🤖 cs.LG

Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach

This paper proposes a data-driven robust Markov decision process framework for Borel spaces with unknown disturbance distributions, utilizing ambiguity sets defined by distance functions to establish finite-sample performance guarantees, probabilistic convergence rates, and out-of-distribution bounds that empirical MDPs fail to provide.

Sivaramakrishnan Ramani2026-03-11✓ Author reviewed ⓘ🤖 cs.LG

MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment

MAPLE introduces a unified training paradigm that enhances medical large language models by integrating Test-Time Reinforcement Learning with expert-aligned Med-RPMs to replace unreliable majority voting with fine-grained process rewards, thereby significantly improving clinical reasoning accuracy and reliability across multiple benchmarks.

Kailong Fan, Anqi Pu, Yichen Wu, Wanhua Li, Yicong Li, Hanspeter Pfister, Huafeng Liu, Xiang Li, Quanzheng Li, Ning Guo2026-03-11🤖 cs.LG

Statistical Inference via Generative Models: Flow Matching and Causal Inference

This book reinterprets generative AI, specifically through flow matching, as a statistical framework for nonparametric distribution learning that enables principled inference for tasks like missing-data imputation and causal analysis by integrating generative models with double/debiased machine learning techniques to ensure inferential validity.

Shinto Eguchi2026-03-11🤖 cs.LG

The Coupling Within: Flow Matching via Distilled Normalizing Flows

This paper introduces Normalized Flow Matching (NFM), a novel method that distills quasi-deterministic couplings from pretrained auto-regressive normalizing flow models to train student flow models, achieving superior performance over both traditional flow matching approaches and the teacher models themselves.

David Berthelot, Tianrong Chen, Jiatao Gu, Marco Cuturi, Laurent Dinh, Bhavik Chandna, Michal Klein, Josh Susskind, Shuangfei Zhai2026-03-11🤖 cs.LG

An accurate flatness measure to estimate the generalization performance of CNN models

This paper proposes an exact, parameterization-aware flatness measure tailored to the geometric structure of convolutional neural networks with global average pooling, demonstrating its effectiveness as a robust proxy for estimating and comparing generalization performance across various CNN architectures.

Rahman Taleghani, Maryam Mohammadi, Francesco Marchetti2026-03-11🤖 cs.LG

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

This paper introduces CALIPER, a data-only, detector-agnostic test that determines the sufficient post-drift data size for stable model retraining by analyzing the trend of a one-step proxy error against a locality parameter, thereby bridging the gap between drift detection and effective adaptation in streaming learning.

Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai2026-03-11🤖 cs.LG

Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning

The paper introduces EPIC, a hardware- and physics-co-guided distributed scientific machine learning framework that significantly reduces communication latency and energy consumption while preserving physical fidelity by performing lightweight local encoding and physics-aware decoding with cross-attention for tasks like full-waveform inversion.

Yuchen Yuan, Junhuan Yang, Hao Wan, Yipei Liu, Hanhan Wu, Youzuo Lin, Lei Yang2026-03-11🤖 cs.LG

SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding

SCALAR is a bidirectional framework that couples LLM-guided symbolic planning with deep RL to iteratively refine skill specifications through execution feedback, significantly outperforming prior methods in complex environments like Craftax by correcting initial planning errors and improving sample efficiency.

Renos Zabounidis, Yue Wu, Simon Stepputtis, Woojun Kim, Yuanzhi Li, Tom Mitchell, Katia Sycara2026-03-11🤖 cs.LG

FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation

This paper presents FlexServe, a high-performance and secure LLM serving system for mobile devices that leverages a novel Flexible Resource Isolation mechanism to overcome the significant overhead of ARM TrustZone, achieving up to 10.05× faster time-to-first-token and 24.30× faster multi-model workflow execution compared to baseline designs.

Yinpeng Wu, Yitong Chen, Lixiang Wang, Jinyu Gu, Zhichao Hua, Yubin Xia2026-03-11🤖 cs.LG

From Days to Minutes: An Autonomous AI Agent Achieves Reliable Clinical Triage in Remote Patient Monitoring

The paper introduces Sentinel, an autonomous AI agent that achieves reliable, scalable clinical triage for remote patient monitoring by outperforming individual clinicians in sensitivity and consistency while maintaining a clinically defensible overtriage profile at a negligible cost.

Sim2Act: Robust Simulation-to-Decision Learning via Adversarial Calibration and Group-Relative Perturbation

The paper proposes Sim2Act, a robust simulation-to-decision framework that enhances policy reliability in mission-critical domains by combining an adversarial calibration mechanism to align simulation fidelity with decision impact and a group-relative perturbation strategy to stabilize learning without overly conservative constraints.

Hongyu Cao, Jinghan Zhang, Kunpeng Liu, Dongjie Wang, Feng Xia, Haifeng Chen, Xiaohua Hu, Yanjie Fu2026-03-11🤖 cs.AI

← Previous Next →

cs.LG