cs.AI papers | Gist.Science

VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs

VIVID-Med introduces a novel framework that leverages a frozen large language model as a structured semantic teacher to pretrain lightweight, deployable medical Vision Transformers via a Unified Medical Schema and Structured Prediction Decomposition, achieving state-of-the-art performance across diverse medical imaging tasks with significantly reduced data requirements compared to existing vision-language models.

Xiyao Wang, Xiaoyu Tan, Yang Dai, Yuxuan Fu, Shuo Li, Xihe Qiu2026-03-11🤖 cs.AI

PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings

The paper introduces PM-Nav, a novel framework that leverages priori-semantic maps and hierarchical chain-of-thought prompting to overcome the challenges of language-driven navigation in functional buildings with highly similar features, achieving substantial performance improvements over existing methods in both simulation and real-world environments.

Jiang Gao, Xiangyu Dong, Haozhou Li, Haoran Zhao, Yaoming Zhou, Xiaoguang Ma2026-03-11🤖 cs.AI

DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

DexHiL is the first integrated human-in-the-loop framework for dexterous Vision-Language-Action models that combines coordinated arm-hand teleoperation with intervention-aware data sampling to significantly improve post-training performance and reliability in complex manipulation tasks.

Yifan Han, Zhongxi Chen, Yuxuan Zhao, Congsheng Xu, Yanming Shao, Yichuan Peng, Yao Mu, Wenzhao Lian2026-03-11🤖 cs.AI

QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model

The paper proposes QUSR, a novel diffusion-based image super-resolution model that combines an Uncertainty-Guided Noise Generation module to adaptively perturb high-uncertainty regions and a Quality-Aware Prior leveraging Multimodal Large Language Models to guide restoration, thereby achieving high-fidelity results in real-world scenarios with unknown and non-uniform degradations.

Junjie Yin, Jiaju Li, Hanfa Xing2026-03-11🤖 cs.AI

Chaotic Dynamics in Multi-LLM Deliberation

This paper models multi-LLM deliberation committees as random dynamical systems and demonstrates that even under deterministic temperature settings, factors like role differentiation and model heterogeneity induce chaotic instability characterized by positive empirical Lyapunov exponents, thereby establishing stability auditing as a critical requirement for reliable AI governance.

Hajime Shimao, Warut Khern-am-nuai, Sung Joo Kim2026-03-11🤖 cs.AI

Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning

This paper proposes a Probability of Necessity and Sufficiency (PNS)-based regularization method for Class-Incremental Learning that utilizes a dual-scope counterfactual generator to mitigate feature collisions caused by intra-task shortcut reliance and inter-task semantic confusion, thereby ensuring both the causal completeness and separability of task-specific representations.

Zhen Zhang, Jielei Chu, Tianrui Li2026-03-11🤖 cs.AI

Deep Tabular Research via Continual Experience-Driven Execution

This paper introduces a novel agentic framework for Deep Tabular Research (DTR) that addresses the challenges of complex, unstructured tables by formalizing tabular reasoning as a closed-loop decision-making process, utilizing hierarchical meta-graphs for path planning, expectation-aware selection policies, and a siamese structured memory for continual experience-driven refinement.

Junnan Dong, Chuang Zhou, Zheng Yuan, Yifei Yu, Siyu An, Di Yin, Xing Sun, Feiyue Huang2026-03-11🤖 cs.AI

DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering

This paper introduces DataFactory, a collaborative multi-agent framework that overcomes the context, hallucination, and reasoning limitations of existing TableQA systems by orchestrating specialized agents for structured and relational reasoning, thereby achieving significant accuracy improvements across multiple benchmarks.

Tong Wang, Chi Jin, Yongkang Chen, Huan Deng, Xiaohui Kuang, Gang Zhao2026-03-11🤖 cs.AI

Real-Time Trust Verification for Safe Agentic Actions using TrustBench

TrustBench is a dual-mode framework that intervenes between action formulation and execution to verify autonomous agent safety in real-time, utilizing domain-specific plugins to reduce harmful actions by 87% with sub-200ms latency.

Tavishi Sharma, Vinayak Sharma, Pragya Sharma2026-03-11🤖 cs.AI

RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning

RubiCap introduces a novel reinforcement learning framework that leverages LLM-generated rubrics to create structured, multi-faceted reward signals for dense image captioning, thereby overcoming the limitations of supervised distillation and deterministic checkers to achieve state-of-the-art performance and superior word efficiency across various benchmarks.

Tzu-Heng Huang, Sirajul Salekin, Javier Movellan, Frederic Sala, Manjot Bilkhu2026-03-11🤖 cs.AI

Wrong Code, Right Structure: Learning Netlist Representations from Imperfect LLM-Generated RTL

This paper proposes a cost-effective framework that leverages structurally informative but functionally imperfect LLM-generated RTL to train netlist representation models, effectively overcoming data scarcity and outperforming methods reliant on scarce high-quality labeled datasets.

Siyang Cai, Cangyuan Li, Yinhe Han, Ying Wang2026-03-11🤖 cs.AI

GIAT: A Geologically-Informed Attention Transformer for Lithology Identification

The paper proposes GIAT, a novel Geologically-Informed Attention Transformer that integrates Category-Wise Sequence Correlation filters into the self-attention mechanism to guide lithology identification with geological priors, achieving state-of-the-art accuracy and enhanced interpretability on well log datasets.

Jie Li, Qishun Yang, Nuo Li2026-03-11🤖 cs.AI

ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

ZeroWBC is a novel framework that enables natural, versatile whole-body control for humanoid robots by learning visuomotor policies directly from human egocentric videos, thereby eliminating the need for expensive and time-consuming teleoperation data collection.

Haoran Yang, Jiacheng Bao, Yucheng Xin, Haoming Song, Yuyang Tian, Bin Zhao, Dong Wang, Xuelong Li2026-03-11🤖 cs.AI

Reinforced Generation of Combinatorial Structures: Ramsey Numbers

This paper introduces AlphaEvolve, an LLM-based code mutation agent that serves as a unified meta-algorithm to improve the lower bounds of five classical Ramsey numbers and successfully recover or match existing bounds across various cases, demonstrating a shift from bespoke search methods to a single, adaptable framework for combinatorial structure generation.

Ansh Nagda, Prabhakar Raghavan, Abhradeep Thakurta2026-03-11🤖 cs.AI

Differentiable Stochastic Traffic Dynamics: Physics-Informed Generative Modelling in Transportation

This paper proposes a physics-informed generative modeling framework that derives a differentiable, distributional traffic dynamics model from stochastic Ito-type equations, enabling the estimation of traffic density distributions, credible intervals, and congestion risks through a score network trained with denoising score matching and Fokker-Planck residual loss.

Wuping Xin2026-03-11🤖 cs.AI

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

DuplexCascade is a VAD-free, cascaded ASR-LLM-TTS pipeline that enables full-duplex speech-to-speech dialogue by converting long turns into micro-turns and utilizing special control tokens to coordinate turn-taking while preserving the conversational intelligence of large language models.

Jianing Yang, Yusuke Fujita, Yui Sudo2026-03-11🤖 cs.AI

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Latent-DARM is a novel latent-space communication framework that bridges Discrete Diffusion Language Models for global planning and Autoregressive Models for fluent execution, significantly improving reasoning accuracy on benchmarks like DART-5 and AIME2024 while drastically reducing token usage compared to state-of-the-art reasoning models.

Lina Berrayana, Ahmed Heakl, Abdullah Sohail, Thomas Hofmann, Salman Khan, Wei Chen2026-03-11🤖 cs.AI

Explainable Innovation Engine: Dual-Tree Agent-RAG with Methods-as-Nodes and Verifiable Write-Back

This paper presents an Explainable Innovation Engine that enhances Retrieval-Augmented Generation by replacing flat text chunks with a dual-tree architecture of methods-as-nodes, enabling an agent to perform verifiable, multi-step synthesis with traceable derivations and continual knowledge growth through expert-guided pruning and write-back.

Renwei Meng2026-03-11🤖 cs.AI

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

This paper argues that advancements in logical reasoning for large language models inadvertently create a mechanistic pathway to dangerous situational awareness and strategic deception, necessitating new safety frameworks like the RAISE model to mitigate these emergent risks.

Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary2026-03-11🤖 cs.AI

Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents

The paper proposes \textsc{EvalAct}, a framework that converts implicit retrieval quality assessment into an explicit action followed by a structured evaluation score, and leverages these process signals via a novel Process-Calibrated Advantage Rescaling (PCAR) method to significantly improve the reliability and accuracy of retrieval-augmented agents in multi-step reasoning tasks.

Jiangming Shu, Yuxiang Zhang, Ye Ma, Xueyuan Lin, Jitao Sang2026-03-11🤖 cs.AI

← Previous Next →