cs.AI papers | Gist.Science

Enhancing multimodal analogical reasoning with Logic Augmented Generation

This paper introduces a Logic Augmented Generation (LAG) framework that combines semantic knowledge graphs with prompt heuristics to enhance multimodal analogical reasoning, demonstrating superior performance and explainability in metaphor detection tasks compared to existing baselines and human benchmarks, while also highlighting current limitations in domain-specific understanding.

Anna Sofia Lippolis, Andrea Giovanni Nuzzolese, Aldo Gangemi2026-03-06💻 cs

Foam-Agent: Towards Automated Intelligent CFD Workflows

The paper introduces Foam-Agent, a multi-agent framework leveraging large language models and retrieval-augmented generation to automate end-to-end computational fluid dynamics workflows from natural language prompts, achieving an 88.2% execution success rate on diverse simulation tasks without expert intervention.

Ling Yue, Nithin Somasekharan, Tingwen Zhang + 4 more2026-03-06💻 cs

ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation

ReactDance is a novel diffusion framework that achieves high-fidelity, coherent long-form reactive dance generation by employing Hierarchical Finite Scalar Quantization for fine-grained spatial control and a Blockwise Local Context strategy for efficient, temporally consistent sequence synthesis.

Jingzhong Lin, Xinru Li, Yuanyuan Qi + 8 more2026-03-06💻 cs

Balancing Progress and Safety: A Novel Risk-Aware Objective for RL in Autonomous Driving

This paper proposes a novel, hierarchical, and risk-aware reward function for reinforcement learning in autonomous driving that integrates normalized objectives and an extended Responsibility-Sensitive Safety model, resulting in a 21% reduction in collision rates while maintaining high route progress in unsignalized intersection scenarios.

Ahmed Abouelazm, Jonas Michel, Helen Gremmelmaier + 3 more2026-03-06💻 cs

Boundary-Guided Trajectory Prediction for Road Aware and Physically Feasible Autonomous Driving

This paper proposes a novel boundary-guided trajectory prediction framework that leverages HD map constraints and kinematic acceleration profiles to generate physically feasible, on-road, and robust autonomous driving predictions, significantly reducing off-road errors and improving generalization compared to existing baselines.

Ahmed Abouelazm, Mianzhi Liu, Christian Hubschneider + 3 more2026-03-06💻 cs

Automatic Curriculum Learning for Driving Scenarios: Towards Robust and Efficient Reinforcement Learning

This paper proposes an automatic curriculum learning framework that employs a "teacher" to dynamically generate driving scenarios with adaptive complexity based on an agent's current capabilities, thereby overcoming the inefficiencies of fixed scenarios and domain randomization to achieve faster convergence and superior generalization in end-to-end autonomous driving reinforcement learning.

Ahmed Abouelazm, Tim Weinstein, Tim Joseph + 2 more2026-03-06💻 cs

Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference

This paper introduces CausalPitfalls, a comprehensive benchmark designed to rigorously evaluate and expose the significant limitations of large language models in handling statistical causal inference pitfalls, such as Simpson's paradox, through both direct and code-assisted prompting protocols.

Jin Du, Li Chen, Xun Xian + 6 more2026-03-06💻 cs

ShIOEnv: A Command Evaluation Environment for Grammar-Constrained Synthesis and Execution Behavior Modeling

This paper introduces ShIOEnv, a grammar-constrained, self-supervised Bash environment that generates 2.1 million system-grounded input-output pairs to significantly improve the accuracy of modeling complex command-line execution behaviors compared to prior execution-free approaches.

Jarrod Ragsdale, Rajendra Boppana2026-03-06💻 cs

VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use

VTool-R1 is a novel framework that leverages reinforcement learning to train vision-language models to generate multimodal chains of thought by strategically interleaving text with intermediate visual reasoning steps using Python-based editing tools, thereby enhancing performance on structured visual tasks without requiring process-based supervision.

Mingyuan Wu, Jingcheng Yang, Jize Jiang + 6 more2026-03-06💻 cs

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

The paper introduces SealQA, a new benchmark comprising three challenging flavors (Seal-0, Seal-Hard, and LongSeal) designed to evaluate search-augmented language models on fact-seeking tasks with noisy or conflicting web results, revealing that even frontier models struggle significantly with reasoning accuracy, robustness to noise, and long-context document retrieval.

Thinh Pham, Nguyen Nguyen, Pratibha Zunjare + 3 more2026-03-06💻 cs

RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks

The paper introduces RoboPARA, a novel LLM-driven framework that utilizes dependency graph generation and re-traversal to optimize dual-arm task parallelism, accompanied by the new X-DAPT dataset for evaluation, demonstrating superior efficiency and reliability in complex multitasking scenarios compared to existing methods.

Shiying Duan, Pei Ren, Nanxiang Jiang + 5 more2026-03-06💻 cs

A Signal Contract for Online Language Grounding and Discovery in Decision-Making

This paper introduces LUCIFER, an inference-only middleware that decouples online language grounding from decision-making via a Signal Contract, enabling autonomous systems to robustly convert evolving human verbal reports into control-relevant signals that improve safety and information efficiency across diverse planning architectures.

Dimitris Panagopoulos, Adolfo Perrusquia, Weisi Guo2026-03-06💻 cs

HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

This paper introduces Poly2Graph, an automated pipeline for generating HSG-12M, a pioneering 16.7-million-scale dataset of spatial multigraphs derived from non-Hermitian crystal energy spectra, which bridges condensed matter physics and geometry-aware graph learning by preserving vital geometric information often discarded in existing benchmarks.

Xianquan Yan, Hakan Akgün, Kenji Kawaguchi + 2 more2026-03-06🔬 cond-mat.mes-hall

InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

InterActHuman is a novel framework that enables high-quality multi-concept human animation by enforcing strong, region-specific binding of text, image, and audio conditions to individual identities, thereby overcoming the limitations of global-conditioning methods in scenarios involving complex human-human and human-object interactions.

Zhenzhi Wang, Jiaqi Yang, Jianwen Jiang + 7 more2026-03-06💻 cs

Bures-Wasserstein Flow Matching for Graph Generation

This paper introduces BWFlow, a graph generation framework that overcomes the limitations of independent node-edge modeling by utilizing Bures-Wasserstein optimal transport on Markov random fields to construct a smooth, theoretically grounded probability path for the joint evolution of graph components, resulting in improved training convergence and sampling efficiency.

Keyue Jiang, Jiahao Cui, Xiaowen Dong + 1 more2026-03-06💻 cs

Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics

This paper introduces Structured Kolmogorov-Arnold Neural ODEs (SKANODEs), a framework that combines structured state-space modeling with Kolmogorov-Arnold Networks to accurately recover interpretable physical latent states and discover compact symbolic governing equations for nonlinear dynamical systems, outperforming black-box neural ODEs and classical identification methods across synthetic and real-world datasets.

Wei Liu, Kiran Bacsa, Loon Ching Tang + 1 more2026-03-06🔬 physics

Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective

This paper demonstrates that Reinforcement Fine-Tuning (RFT) outperforms Supervised Fine-Tuning (SFT) in preserving prior knowledge for multimodal large language models by leveraging training data with smaller influence magnitudes and better alignment to the base model's probability landscape, thereby mitigating catastrophic forgetting while enabling effective task adaptation.

Zhihao Zhang, Qiaole Dong, Qi Zhang + 12 more2026-03-06💻 cs

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

MuRating is a scalable framework that transfers high-quality English data-quality signals to a unified multilingual evaluator via pairwise comparisons and translation, enabling the selection of balanced, high-quality datasets that significantly improve the performance of multilingual large language models on both English and non-English benchmarks.

Zhixun Chen, Ping Guo, Wenhan Han + 10 more2026-03-06💻 cs

Design and Experimental Validation of Sensorless 4-Channel Bilateral Teleoperation for Low-Cost Manipulators

This paper presents a sensorless 4-channel bilateral teleoperation framework that enables stable, high-speed force feedback control on low-cost manipulators through disturbance-observer-based estimation and simplified tuning, ultimately demonstrating that such force-enhanced data significantly improves imitation learning performance.

Koki Yamane, Yunhan Li, Masashi Konosu + 4 more2026-03-06💻 cs

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

This paper introduces TreeBench, a diagnostic benchmark for evaluating traceable visual grounded reasoning, and proposes TreeVGR, a reinforcement learning-based training paradigm that significantly enhances model performance by jointly supervising localization and reasoning.

Haochen Wang, Xiangtai Li, Zilong Huang + 9 more2026-03-06💻 cs

← Previous Next →