cs.AI papers | Gist.Science

A Signal Contract for Online Language Grounding and Discovery in Decision-Making

This paper introduces LUCIFER, an inference-only middleware that decouples online language grounding from decision-making via a Signal Contract, enabling autonomous systems to robustly convert evolving human verbal reports into control-relevant signals that improve safety and information efficiency across diverse planning architectures.

Dimitris Panagopoulos, Adolfo Perrusquia, Weisi Guo2026-03-06💻 cs

HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

This paper introduces Poly2Graph, an automated pipeline for generating HSG-12M, a pioneering 16.7-million-scale dataset of spatial multigraphs derived from non-Hermitian crystal energy spectra, which bridges condensed matter physics and geometry-aware graph learning by preserving vital geometric information often discarded in existing benchmarks.

Xianquan Yan, Hakan Akgün, Kenji Kawaguchi + 2 more2026-03-06🔬 cond-mat.mes-hall

InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

InterActHuman is a novel framework that enables high-quality multi-concept human animation by enforcing strong, region-specific binding of text, image, and audio conditions to individual identities, thereby overcoming the limitations of global-conditioning methods in scenarios involving complex human-human and human-object interactions.

Zhenzhi Wang, Jiaqi Yang, Jianwen Jiang + 7 more2026-03-06💻 cs

Bures-Wasserstein Flow Matching for Graph Generation

This paper introduces BWFlow, a graph generation framework that overcomes the limitations of independent node-edge modeling by utilizing Bures-Wasserstein optimal transport on Markov random fields to construct a smooth, theoretically grounded probability path for the joint evolution of graph components, resulting in improved training convergence and sampling efficiency.

Keyue Jiang, Jiahao Cui, Xiaowen Dong + 1 more2026-03-06💻 cs

Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics

This paper introduces Structured Kolmogorov-Arnold Neural ODEs (SKANODEs), a framework that combines structured state-space modeling with Kolmogorov-Arnold Networks to accurately recover interpretable physical latent states and discover compact symbolic governing equations for nonlinear dynamical systems, outperforming black-box neural ODEs and classical identification methods across synthetic and real-world datasets.

Wei Liu, Kiran Bacsa, Loon Ching Tang + 1 more2026-03-06🔬 physics

Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective

This paper demonstrates that Reinforcement Fine-Tuning (RFT) outperforms Supervised Fine-Tuning (SFT) in preserving prior knowledge for multimodal large language models by leveraging training data with smaller influence magnitudes and better alignment to the base model's probability landscape, thereby mitigating catastrophic forgetting while enabling effective task adaptation.

Zhihao Zhang, Qiaole Dong, Qi Zhang + 12 more2026-03-06💻 cs

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

MuRating is a scalable framework that transfers high-quality English data-quality signals to a unified multilingual evaluator via pairwise comparisons and translation, enabling the selection of balanced, high-quality datasets that significantly improve the performance of multilingual large language models on both English and non-English benchmarks.

Zhixun Chen, Ping Guo, Wenhan Han + 10 more2026-03-06💻 cs

Design and Experimental Validation of Sensorless 4-Channel Bilateral Teleoperation for Low-Cost Manipulators

This paper presents a sensorless 4-channel bilateral teleoperation framework that enables stable, high-speed force feedback control on low-cost manipulators through disturbance-observer-based estimation and simplified tuning, ultimately demonstrating that such force-enhanced data significantly improves imitation learning performance.

Koki Yamane, Yunhan Li, Masashi Konosu + 4 more2026-03-06💻 cs

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

This paper introduces TreeBench, a diagnostic benchmark for evaluating traceable visual grounded reasoning, and proposes TreeVGR, a reinforcement learning-based training paradigm that significantly enhances model performance by jointly supervising localization and reasoning.

Haochen Wang, Xiangtai Li, Zilong Huang + 9 more2026-03-06💻 cs

Overtone: Cyclic Patch Modulation for Clean, Efficient, and Flexible Physics Emulators

Overtone is a unified framework for transformer-based PDE surrogates that employs cyclic patch size modulation via architecture-agnostic modules to dynamically distribute harmonic errors and adapt computational costs, achieving significantly lower long-term rollout errors and flexible efficiency compared to static-patch baselines.

Payel Mukhopadhyay, Michael McCabe, Ruben Ohana + 1 more2026-03-06💻 cs

In-Training Defenses against Emergent Misalignment in Language Models

This paper presents the first systematic study of in-training safeguards against emergent misalignment in fine-tuned language models, demonstrating that interleaving training examples selected by the perplexity gap between aligned and misaligned models effectively prevents broad misalignment while preserving task performance and coherence.

David Kaczér, Magnus Jørgenvåg, Clemens Vetter + 4 more2026-03-06💻 cs

Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation

Vevo2 is a unified framework for controllable speech and singing voice generation that employs novel audio tokenizers and a two-stage modeling approach with specialized training strategies to achieve flexible control over content, prosody, style, and timbre while demonstrating strong generalization across diverse synthesis tasks.

Xueyao Zhang, Junan Zhang, Yuancheng Wang + 5 more2026-03-06💻 cs

LHM-Humanoid: Learning a Unified Policy for Long-Horizon Humanoid Whole-Body Loco-Manipulation in Diverse Messy Environments

The paper introduces LHM-Humanoid, a unified learning framework and benchmark that employs reinforcement learning and policy distillation to enable humanoid agents to perform robust, long-horizon loco-manipulation tasks across diverse, cluttered environments without relying on pre-trained skill libraries or environment resets.

Haozhuo Zhang, Jingkai Sun, Michele Caprio + 4 more2026-03-06💻 cs

A Geometric Perspective on the Difficulties of Learning GNN-based SAT Solvers

This paper attributes the performance degradation of GNN-based SAT solvers on difficult instances to inherent negative graph Ricci curvature in formula representations, which causes oversquashing by creating local connectivity bottlenecks that hinder the compression of long-range dependencies.

Geri Skenderi2026-03-06🔬 physics

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition

This paper proposes TSPC, a novel two-stage phoneme-centric architecture that leverages an extended Vietnamese phoneme set as an intermediate representation to significantly improve Vietnamese-English code-switching speech recognition accuracy while maintaining computational efficiency.

Tran Nguyen Anh, Truong Dinh Dung, Vo Van Nam + 1 more2026-03-06💻 cs

Quadrotor Navigation using Reinforcement Learning with Privileged Information

This paper presents a reinforcement learning-based quadrotor navigation method that utilizes privileged time-of-arrival maps and a yaw alignment loss to successfully navigate around large obstacles in cluttered environments, achieving an 86% success rate in simulation and demonstrating collision-free flight in real-world outdoor conditions.

Jonathan Lee, Abhishek Rathod, Kshitij Goel + 2 more2026-03-06💻 cs

Diffusion-Based Impedance Learning for Contact-Rich Manipulation Tasks

This paper introduces Diffusion-Based Impedance Learning, a framework that combines a Transformer-based diffusion model with energy-consistent impedance control to enable robots to learn and adapt contact-rich manipulation behaviors from teleoperated demonstrations, achieving high-precision performance and robust generalization in tasks like peg-in-hole insertion.

Noah Geiger, Tamim Asfour, Neville Hogan + 1 more2026-03-06💻 cs

Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones

This paper reveals that SpeechLLM backbones struggle with conversational disfluencies due to a bias toward semantic abstraction over structural fidelity, with performance varying by architecture and fine-tuning often compromising generalization despite achieving state-of-the-art results.

Maria Teleki, Sai Janjur, Haoran Liu + 11 more2026-03-06💻 cs

Complexity-Regularized Proximal Policy Optimization

This paper introduces Complexity-Regularized Proximal Policy Optimization (CR-PPO), a novel algorithm that replaces standard entropy regularization with a self-regulating complexity term—defined as the product of Shannon entropy and disequilibrium—to maintain beneficial stochasticity while reducing sensitivity to hyperparameter tuning and avoiding the overriding of reward signals.

Luca Serfilippi, Giorgio Franceschelli, Antonio Corradi + 1 more2026-03-06💻 cs

BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving

BridgeDrive introduces a novel anchor-guided diffusion bridge policy that ensures theoretical consistency between forward and reverse processes to transform coarse expert trajectories into refined, safe, and reactive closed-loop plans, achieving state-of-the-art performance on the Bench2Drive benchmark.

Shu Liu, Wenlin Chen, Weihao Li + 7 more2026-03-06💻 cs

← Previous Next →