VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs

VIVID-Med introduces a novel framework that leverages a frozen large language model as a structured semantic teacher to pretrain lightweight, deployable medical Vision Transformers via a Unified Medical Schema and Structured Prediction Decomposition, achieving state-of-the-art performance across diverse medical imaging tasks with significantly reduced data requirements compared to existing vision-language models.

Xiyao Wang, Xiaoyu Tan, Yang Dai, Yuxuan Fu, Shuo Li, Xihe Qiu2026-03-11🤖 cs.AI

PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings

The paper introduces PM-Nav, a novel framework that leverages priori-semantic maps and hierarchical chain-of-thought prompting to overcome the challenges of language-driven navigation in functional buildings with highly similar features, achieving substantial performance improvements over existing methods in both simulation and real-world environments.

Jiang Gao, Xiangyu Dong, Haozhou Li, Haoran Zhao, Yaoming Zhou, Xiaoguang Ma2026-03-11🤖 cs.AI

DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

DexHiL is the first integrated human-in-the-loop framework for dexterous Vision-Language-Action models that combines coordinated arm-hand teleoperation with intervention-aware data sampling to significantly improve post-training performance and reliability in complex manipulation tasks.

Yifan Han, Zhongxi Chen, Yuxuan Zhao, Congsheng Xu, Yanming Shao, Yichuan Peng, Yao Mu, Wenzhao Lian2026-03-11🤖 cs.AI

QUSR: Quality-Aware and Uncertainty-Guided Image Super-Resolution Diffusion Model

The paper proposes QUSR, a novel diffusion-based image super-resolution model that combines an Uncertainty-Guided Noise Generation module to adaptively perturb high-uncertainty regions and a Quality-Aware Prior leveraging Multimodal Large Language Models to guide restoration, thereby achieving high-fidelity results in real-world scenarios with unknown and non-uniform degradations.

Junjie Yin, Jiaju Li, Hanfa Xing2026-03-11🤖 cs.AI

Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning

This paper proposes a Probability of Necessity and Sufficiency (PNS)-based regularization method for Class-Incremental Learning that utilizes a dual-scope counterfactual generator to mitigate feature collisions caused by intra-task shortcut reliance and inter-task semantic confusion, thereby ensuring both the causal completeness and separability of task-specific representations.

Zhen Zhang, Jielei Chu, Tianrui Li2026-03-11🤖 cs.AI

Deep Tabular Research via Continual Experience-Driven Execution

This paper introduces a novel agentic framework for Deep Tabular Research (DTR) that addresses the challenges of complex, unstructured tables by formalizing tabular reasoning as a closed-loop decision-making process, utilizing hierarchical meta-graphs for path planning, expectation-aware selection policies, and a siamese structured memory for continual experience-driven refinement.

Junnan Dong, Chuang Zhou, Zheng Yuan, Yifei Yu, Siyu An, Di Yin, Xing Sun, Feiyue Huang2026-03-11🤖 cs.AI

DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering

This paper introduces DataFactory, a collaborative multi-agent framework that overcomes the context, hallucination, and reasoning limitations of existing TableQA systems by orchestrating specialized agents for structured and relational reasoning, thereby achieving significant accuracy improvements across multiple benchmarks.

Tong Wang, Chi Jin, Yongkang Chen, Huan Deng, Xiaohui Kuang, Gang Zhao2026-03-11🤖 cs.AI

RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning

RubiCap introduces a novel reinforcement learning framework that leverages LLM-generated rubrics to create structured, multi-faceted reward signals for dense image captioning, thereby overcoming the limitations of supervised distillation and deterministic checkers to achieve state-of-the-art performance and superior word efficiency across various benchmarks.

Tzu-Heng Huang, Sirajul Salekin, Javier Movellan, Frederic Sala, Manjot Bilkhu2026-03-11🤖 cs.AI

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Latent-DARM is a novel latent-space communication framework that bridges Discrete Diffusion Language Models for global planning and Autoregressive Models for fluent execution, significantly improving reasoning accuracy on benchmarks like DART-5 and AIME2024 while drastically reducing token usage compared to state-of-the-art reasoning models.

Lina Berrayana, Ahmed Heakl, Abdullah Sohail, Thomas Hofmann, Salman Khan, Wei Chen2026-03-11🤖 cs.AI

Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents

The paper proposes \textsc{EvalAct}, a framework that converts implicit retrieval quality assessment into an explicit action followed by a structured evaluation score, and leverages these process signals via a novel Process-Calibrated Advantage Rescaling (PCAR) method to significantly improve the reliability and accuracy of retrieval-augmented agents in multi-step reasoning tasks.

Jiangming Shu, Yuxiang Zhang, Ye Ma, Xueyuan Lin, Jitao Sang2026-03-11🤖 cs.AI