LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

LD-RPS proposes a novel, dataset-free, and unified image restoration framework that leverages recurrent posterior sampling on a pretrained latent diffusion model, enhanced by multimodal semantic priors and a lightweight alignment module, to achieve superior performance across various degradation types without task-specific training.

Huaqiu Li, Yong Wang, Tongwen Huang, Hailang Huang, Haoqian Wang, Xiangxiang Chu2026-03-10💻 cs

Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

This paper identifies a pervasive "agreement bias" in Multimodal LLM verifiers that causes them to over-validate agent behavior, and proposes a lightweight Self-Grounded Verification (SGV) method that significantly improves failure detection and task completion across web navigation, computer use, and robotics by decoupling prior generation from trajectory evaluation.

Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt Kira2026-03-10🤖 cs.LG

Unified Medical Image Segmentation with State Space Modeling Snake

The paper proposes Mamba Snake, a novel deep snake framework enhanced by state space modeling and a dual-classification synergy mechanism, which effectively addresses the challenges of multi-scale structural heterogeneity in Unified Medical Image Segmentation by modeling inter-organ topological relationships and refining complex morphologies to achieve superior performance across five clinical datasets.

Ruicheng Zhang, Haowei Guo, Kanghui Tian, Jun Zhou, Mingliang Yan, Zeyu Zhang, Shen Zhao2026-03-10💻 cs

InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis

This paper introduces InsightX Agent, a novel Large Multimodal Model-based agentic framework that integrates a Sparse Deformable Multi-Scale Detector with an Evidence-Grounded Reflection tool to achieve reliable, interpretable, and interactive X-ray non-destructive testing analysis, demonstrated by a 96.54% F1-score on the GDXray+ dataset.

Jiale Liu, Huan Wang, Yue Zhang + 4 more2026-03-10🤖 cs.AI

Post-Disaster Affected Area Segmentation with a Vision Transformer (ViT)-based EVAP Model using Sentinel-2 and Formosat-5 Imagery

This paper proposes a Vision Transformer-based framework that leverages PCA-driven weak supervision to expand limited manual annotations for refining disaster-affected area segmentation using Sentinel-2 and Formosat-5 imagery, thereby enhancing the reliability and scalability of the Taiwan Space Agency's Emergent Value Added Product (EVAP) in scenarios with scarce ground truth.

Yi-Shan Chu, Hsuan-Cheng Wei2026-03-10💻 cs

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

The paper introduces CauKer, a novel algorithm that combines Gaussian Process kernel composition with Structural Causal Models to generate diverse, causally coherent synthetic time series, enabling sample-efficient pre-training of classification foundation models that exhibit clear scaling laws across varying dataset sizes and model capacities.

Shifeng Xie, Vasilii Feofanov, Ambroise Odonnat, Lei Zan, Marius Alonso, Jianfeng Zhang, Themis Palpanas, Lujia Pan, Keli Zhang, Ievgen Redko2026-03-10🤖 cs.LG

GraphProp: Training the Graph Foundation Models using Graph Properties

GraphProp is a two-phase framework for training graph foundation models that first learns structural generalization by predicting graph invariants and then leverages these representations as positional encodings to enhance cross-domain performance in graph-level tasks, particularly outperforming existing methods in scenarios with limited data or missing node attributes.

Ziheng Sun, Qi Feng, Lehao Lin, Chris Ding, Jicong Fan2026-03-10🤖 cs.LG

Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding

Video-EM introduces a training-free, event-centric episodic memory framework that enhances long-form video understanding by orchestrating an LLM to localize, segment, and refine query-relevant moments into a compact, temporally coherent event timeline, thereby overcoming the context limitations of existing Video-LLMs without requiring architectural changes.

Yun Wang, Long Zhang, Jingren Liu, Jiaqi Yan, Zhanjie Zhang, Jiahao Zheng, Ao Ma, Run Ling, Xun Yang, Dapeng Wu, Xiangyu Chen, Xuelong Li2026-03-10💻 cs

Entropy-Driven Curriculum for Multi-Task Training in Human Mobility Prediction

This paper proposes a unified training framework that combines entropy-driven curriculum learning, which sequences training from simple to complex trajectories based on Lempel-Ziv compression, with multi-task learning to simultaneously optimize location, distance, and direction predictions, thereby achieving state-of-the-art performance and significantly faster convergence in human mobility prediction.

Tianye Fang, Xuanshu Luo, Martin Werner2026-03-10🤖 cs.LG

Improving the Resilience of Quadrotors in Underground Environments by Combining Learning-based and Safety Controllers

This paper proposes a hybrid control framework that enhances quadrotor resilience in underground environments by using a normalizing flow-based prior as a runtime monitor to dynamically switch between a learning-based controller for efficiency and a safety controller for collision avoidance when encountering out-of-distribution scenarios.

Isaac Ronald Ward, Mark Paral, Kristopher Riordan + 1 more2026-03-10⚡ eess