cs.LG papers | Gist.Science

Exclusive Self Attention

The paper introduces Exclusive Self Attention (XSA), a modification that constrains attention to information orthogonal to a token's own value vector, thereby improving Transformer performance in language modeling tasks, particularly as sequence length increases.

Shuangfei ZhaiWed, 11 Ma🤖 cs.LG

Learning Adaptive LLM Decoding

This paper proposes learning lightweight, reinforcement-trained decoding adapters that dynamically select sampling strategies at both the sequence and token levels based on prompt features and compute budgets, significantly improving the accuracy-efficiency tradeoff on math and coding benchmarks compared to fixed hyperparameter baselines.

Chloe H. Su, Zhe Ye, Samuel Tenka, Aidan Yang, Soonho Kong, Udaya GhaiWed, 11 Ma🤖 cs.LG

Dynamic Multi-period Experts for Online Time Series Forecasting

This paper introduces DynaME, a novel hybrid framework for online time series forecasting that redefines concept drift into recurring and emergent types, utilizing specialized historical experts for the former and a stable general expert for the latter to significantly outperform existing baselines.

Seungha Hong, Sukang Chae, Suyeon Kim, Sanghwan Jang, Hwanjo YuWed, 11 Ma🤖 cs.LG

SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding

SCALAR is a bidirectional framework that couples LLM-guided symbolic planning with deep RL to iteratively refine skill specifications through execution feedback, significantly outperforming prior methods in complex environments like Craftax by correcting initial planning errors and improving sample efficiency.

Renos Zabounidis, Yue Wu, Simon Stepputtis, Woojun Kim, Yuanzhi Li, Tom Mitchell, Katia SycaraWed, 11 Ma🤖 cs.LG

Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning

The paper introduces EPIC, a hardware- and physics-co-guided distributed scientific machine learning framework that significantly reduces communication latency and energy consumption while preserving physical fidelity by performing lightweight local encoding and physics-aware decoding with cross-attention for tasks like full-waveform inversion.

Yuchen Yuan, Junhuan Yang, Hao Wan, Yipei Liu, Hanhan Wu, Youzuo Lin, Lei YangWed, 11 Ma🤖 cs.LG

When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency

This paper introduces CALIPER, a data-only, detector-agnostic test that determines the sufficient post-drift data size for stable model retraining by analyzing the trend of a one-step proxy error against a locality parameter, thereby bridging the gap between drift detection and effective adaptation in streaming learning.

Ren Fujiwara, Yasuko Matsubara, Yasushi SakuraiWed, 11 Ma🤖 cs.LG

An accurate flatness measure to estimate the generalization performance of CNN models

This paper proposes an exact, parameterization-aware flatness measure tailored to the geometric structure of convolutional neural networks with global average pooling, demonstrating its effectiveness as a robust proxy for estimating and comparing generalization performance across various CNN architectures.

Rahman Taleghani, Maryam Mohammadi, Francesco MarchettiWed, 11 Ma🤖 cs.LG

The Coupling Within: Flow Matching via Distilled Normalizing Flows

This paper introduces Normalized Flow Matching (NFM), a novel method that distills quasi-deterministic couplings from pretrained auto-regressive normalizing flow models to train student flow models, achieving superior performance over both traditional flow matching approaches and the teacher models themselves.

David Berthelot, Tianrong Chen, Jiatao Gu, Marco Cuturi, Laurent Dinh, Bhavik Chandna, Michal Klein, Josh Susskind, Shuangfei ZhaiWed, 11 Ma🤖 cs.LG

MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment

MAPLE introduces a unified training paradigm that enhances medical large language models by integrating Test-Time Reinforcement Learning with expert-aligned Med-RPMs to replace unreliable majority voting with fine-grained process rewards, thereby significantly improving clinical reasoning accuracy and reliability across multiple benchmarks.

Kailong Fan, Anqi Pu, Yichen Wu, Wanhua Li, Yicong Li, Hanspeter Pfister, Huafeng Liu, Xiang Li, Quanzheng Li, Ning GuoWed, 11 Ma🤖 cs.LG

MAcPNN: Mutual Assisted Learning on Data Streams with Temporal Dependence

This paper proposes MAcPNN, a decentralized Mutual Assisted Learning paradigm inspired by Vygotsky's Sociocultural Theory that enables autonomous IoT devices to collaboratively address concept drifts and temporal dependence in data streams using Continuous Progressive Neural Networks while minimizing communication overhead compared to traditional Federated Learning.

Federico Giannini, Emanuele Della ValleWed, 11 Ma🤖 cs.LG

The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference

This paper introduces the $qs$ inequality to demonstrate that Mixture-of-Experts (MoE) models suffer from a structural "double penalty" of routing fragmentation and memory constraints during inference, often rendering them significantly less efficient than quality-matched dense models for long-context serving despite their training-time FLOP advantages.

Vignesh Adhinarayanan, Nuwan JayasenaWed, 11 Ma🤖 cs.LG

Quantifying Memorization and Privacy Risks in Genomic Language Models

This paper introduces a comprehensive multi-vector privacy evaluation framework that quantifies memorization risks in Genomic Language Models by integrating perplexity-based detection, canary sequence extraction, and membership inference, revealing that these models exhibit measurable data leakage dependent on architecture and training dynamics.

Alexander Nemecek, Wenbiao Li, Xiaoqian Jiang, Jaideep Vaidya, Erman AydayWed, 11 Ma🤖 cs.LG

Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models

This paper theoretically and empirically demonstrates that hybrid sequence models, which combine Transformers and state-space models, can provably solve core synthetic tasks with significantly fewer parameters and less memory than non-hybrid models while also achieving superior length generalization and out-of-distribution robustness.

John Cooper, Ilias Diakonikolas, Mingchen Ma, Frederic SalaWed, 11 Ma🤖 cs.LG

SoftJAX & SoftTorch: Empowering Automatic Differentiation Libraries with Informative Gradients

This paper introduces SoftJAX and SoftTorch, open-source libraries that provide feature-complete, drop-in soft relaxations for hard, non-differentiable primitives in JAX and PyTorch, thereby enabling informative gradients for optimization tasks involving operations like thresholding, sorting, and Boolean logic.

Anselm Paulus, A. René Geist, Vít Musil, Sebastian Hoffmann, Onur Beker, Georg MartiusWed, 11 Ma🤖 cs.LG

The Temporal Markov Transition Field

This paper introduces the Temporal Markov Transition Field (TMTF), a novel time series representation that overcomes the limitations of the global Markov Transition Field by partitioning data into temporal chunks to preserve regime-specific dynamics, thereby creating a structured image suitable for convolutional neural networks.

Michael LeznikWed, 11 Ma🤖 cs.LG

SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning

The paper introduces SPREAD, a geometry-preserving framework for lifelong imitation learning that utilizes singular value decomposition to align policy representations within low-rank subspaces and a confidence-guided distillation strategy to mitigate catastrophic forgetting while achieving state-of-the-art performance on the LIBERO benchmark.

Kaushik Roy, Giovanni D'urso, Nicholas Lawrance, Brendan Tidd, Peyman MoghadamWed, 11 Ma🤖 cs.LG

Equitable Multi-Task Learning for AI-RANs

This paper proposes the Online-Within-Online Fair Multi-Task Learning (OWO-FMTL) framework, which leverages a dual-loop mechanism with primal-dual updates to ensure long-term equitable inference performance for heterogeneous users in AI-RANs while maintaining low computational overhead.

Panayiotis Raptis, Fatih Aslan, George IosifidisWed, 11 Ma🤖 cs.LG

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

This paper introduces \textsc{Gome}, a gradient-based MLE agent that outperforms traditional tree search methods on MLE-Bench by mapping diagnostic reasoning to gradient computation, demonstrating that as LLM reasoning capabilities improve, gradient-based optimization becomes increasingly superior to exhaustive enumeration.

Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang BianWed, 11 Ma🤖 cs.AI

Breaking the Factorization Barrier in Diffusion Language Models

The paper introduces Coupled Discrete Diffusion (CoDD), a hybrid framework that overcomes the "factorization barrier" in diffusion language models by replacing fully factorized outputs with a lightweight probabilistic inference layer, thereby enabling efficient parallel generation of coherent, high-quality text without the prohibitive costs of full joint modeling or reinforcement learning.

Ian Li, Zilei Shao, Benjie Wang, Rose Yu, Guy Van den Broeck, Anji LiuWed, 11 Ma🤖 cs.AI

Continual uncertainty learning

This paper proposes a curriculum-based continual learning framework that decomposes complex robust control problems with multiple uncertainties into sequential tasks, combining a model-based controller with deep reinforcement learning to achieve efficient, non-forgetting policy updates and successful sim-to-real transfer for automotive powertrain vibration control.

Heisei Yonezawa, Ansei Yonezawa, Itsuro KajiwaraWed, 11 Ma🤖 cs.AI

← Previous Next →

cs.LG