SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding

SCALAR is a bidirectional framework that couples LLM-guided symbolic planning with deep RL to iteratively refine skill specifications through execution feedback, significantly outperforming prior methods in complex environments like Craftax by correcting initial planning errors and improving sample efficiency.

Renos Zabounidis, Yue Wu, Simon Stepputtis, Woojun Kim, Yuanzhi Li, Tom Mitchell, Katia SycaraWed, 11 Ma🤖 cs.LG

Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning

The paper introduces EPIC, a hardware- and physics-co-guided distributed scientific machine learning framework that significantly reduces communication latency and energy consumption while preserving physical fidelity by performing lightweight local encoding and physics-aware decoding with cross-attention for tasks like full-waveform inversion.

Yuchen Yuan, Junhuan Yang, Hao Wan, Yipei Liu, Hanhan Wu, Youzuo Lin, Lei YangWed, 11 Ma🤖 cs.LG

The Coupling Within: Flow Matching via Distilled Normalizing Flows

This paper introduces Normalized Flow Matching (NFM), a novel method that distills quasi-deterministic couplings from pretrained auto-regressive normalizing flow models to train student flow models, achieving superior performance over both traditional flow matching approaches and the teacher models themselves.

David Berthelot, Tianrong Chen, Jiatao Gu, Marco Cuturi, Laurent Dinh, Bhavik Chandna, Michal Klein, Josh Susskind, Shuangfei ZhaiWed, 11 Ma🤖 cs.LG

MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment

MAPLE introduces a unified training paradigm that enhances medical large language models by integrating Test-Time Reinforcement Learning with expert-aligned Med-RPMs to replace unreliable majority voting with fine-grained process rewards, thereby significantly improving clinical reasoning accuracy and reliability across multiple benchmarks.

Kailong Fan, Anqi Pu, Yichen Wu, Wanhua Li, Yicong Li, Hanspeter Pfister, Huafeng Liu, Xiang Li, Quanzheng Li, Ning GuoWed, 11 Ma🤖 cs.LG

The qsqs Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference

This paper introduces the qsqs inequality to demonstrate that Mixture-of-Experts (MoE) models suffer from a structural "double penalty" of routing fragmentation and memory constraints during inference, often rendering them significantly less efficient than quality-matched dense models for long-context serving despite their training-time FLOP advantages.

Vignesh Adhinarayanan, Nuwan JayasenaWed, 11 Ma🤖 cs.LG

Quantifying Memorization and Privacy Risks in Genomic Language Models

This paper introduces a comprehensive multi-vector privacy evaluation framework that quantifies memorization risks in Genomic Language Models by integrating perplexity-based detection, canary sequence extraction, and membership inference, revealing that these models exhibit measurable data leakage dependent on architecture and training dynamics.

Alexander Nemecek, Wenbiao Li, Xiaoqian Jiang, Jaideep Vaidya, Erman AydayWed, 11 Ma🤖 cs.LG

SoftJAX & SoftTorch: Empowering Automatic Differentiation Libraries with Informative Gradients

This paper introduces SoftJAX and SoftTorch, open-source libraries that provide feature-complete, drop-in soft relaxations for hard, non-differentiable primitives in JAX and PyTorch, thereby enabling informative gradients for optimization tasks involving operations like thresholding, sorting, and Boolean logic.

Anselm Paulus, A. René Geist, Vít Musil, Sebastian Hoffmann, Onur Beker, Georg MartiusWed, 11 Ma🤖 cs.LG

SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning

The paper introduces SPREAD, a geometry-preserving framework for lifelong imitation learning that utilizes singular value decomposition to align policy representations within low-rank subspaces and a confidence-guided distillation strategy to mitigate catastrophic forgetting while achieving state-of-the-art performance on the LIBERO benchmark.

Kaushik Roy, Giovanni D'urso, Nicholas Lawrance, Brendan Tidd, Peyman MoghadamWed, 11 Ma🤖 cs.LG

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

This paper introduces \textsc{Gome}, a gradient-based MLE agent that outperforms traditional tree search methods on MLE-Bench by mapping diagnostic reasoning to gradient computation, demonstrating that as LLM reasoning capabilities improve, gradient-based optimization becomes increasingly superior to exhaustive enumeration.

Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang BianWed, 11 Ma🤖 cs.AI

Breaking the Factorization Barrier in Diffusion Language Models

The paper introduces Coupled Discrete Diffusion (CoDD), a hybrid framework that overcomes the "factorization barrier" in diffusion language models by replacing fully factorized outputs with a lightweight probabilistic inference layer, thereby enabling efficient parallel generation of coherent, high-quality text without the prohibitive costs of full joint modeling or reinforcement learning.

Ian Li, Zilei Shao, Benjie Wang, Rose Yu, Guy Van den Broeck, Anji LiuWed, 11 Ma🤖 cs.AI