cs.AI papers | Gist.Science

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

LD-RPS proposes a novel, dataset-free, and unified image restoration framework that leverages recurrent posterior sampling on a pretrained latent diffusion model, enhanced by multimodal semantic priors and a lightweight alignment module, to achieve superior performance across various degradation types without task-specific training.

Huaqiu Li, Yong Wang, Tongwen Huang, Hailang Huang, Haoqian Wang, Xiangxiang Chu2026-03-10💻 cs

Noisy PDE Training Requires Bigger PINNs

This paper establishes that Physics-Informed Neural Networks (PINNs) require a network size scaling with the number of noisy samples to achieve empirical risk below the noise variance, demonstrating that simply increasing data quantity cannot compensate for insufficient model capacity in noisy PDE training.

Sebastien Andre-Sloan, Anirbit Mukherjee, Matthew Colbrook2026-03-10🤖 cs.LG

A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition

This paper proposes MCULoRA, a novel parameter-efficient framework featuring modality combination aware low-rank adaptation and dynamic parameter fine-tuning to resolve gradient conflicts and improve performance in incomplete multimodal emotion recognition.

Xinkui Zhao, Jinsong Shu, Yangyang Wu, Guanjie Cheng, Zihe Liu, Naibo Wang, Shuiguang Deng, Zhongle Xie, Jianwei Yin2026-03-10💻 cs

Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification

This paper identifies a pervasive "agreement bias" in Multimodal LLM verifiers that causes them to over-validate agent behavior, and proposes a lightweight Self-Grounded Verification (SGV) method that significantly improves failure detection and task completion across web navigation, computer use, and robotics by decoupling prior generation from trajectory evaluation.

Moises Andrade, Joonhyuk Cha, Brandon Ho, Vriksha Srihari, Karmesh Yadav, Zsolt Kira2026-03-10🤖 cs.LG

Unified Medical Image Segmentation with State Space Modeling Snake

The paper proposes Mamba Snake, a novel deep snake framework enhanced by state space modeling and a dual-classification synergy mechanism, which effectively addresses the challenges of multi-scale structural heterogeneity in Unified Medical Image Segmentation by modeling inter-organ topological relationships and refining complex morphologies to achieve superior performance across five clinical datasets.

Ruicheng Zhang, Haowei Guo, Kanghui Tian, Jun Zhou, Mingliang Yan, Zeyu Zhang, Shen Zhao2026-03-10💻 cs

InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis

This paper introduces InsightX Agent, a novel Large Multimodal Model-based agentic framework that integrates a Sparse Deformable Multi-Scale Detector with an Evidence-Grounded Reflection tool to achieve reliable, interpretable, and interactive X-ray non-destructive testing analysis, demonstrated by a 96.54% F1-score on the GDXray+ dataset.

Jiale Liu, Huan Wang, Yue Zhang + 4 more2026-03-10🤖 cs.AI

Post-Disaster Affected Area Segmentation with a Vision Transformer (ViT)-based EVAP Model using Sentinel-2 and Formosat-5 Imagery

This paper proposes a Vision Transformer-based framework that leverages PCA-driven weak supervision to expand limited manual annotations for refining disaster-affected area segmentation using Sentinel-2 and Formosat-5 imagery, thereby enhancing the reliability and scalability of the Taiwan Space Agency's Emergent Value Added Product (EVAP) in scenarios with scarce ground truth.

Yi-Shan Chu, Hsuan-Cheng Wei2026-03-10💻 cs

Flow Matching Meets Biology and Life Science: A Survey

This paper presents the first comprehensive survey of flow matching applications in biology and life sciences, systematically reviewing its theoretical foundations and categorizing its recent advancements in biological sequence modeling, molecule design, and protein generation.

Zihao Li, Zhichen Zeng, Xiao Lin, Feihao Fang, Yanru Qu, Zhe Xu, Zhining Liu, Xuying Ning, Tianxin Wei, Ge Liu, Hanghang Tong, Jingrui He2026-03-10🤖 cs.LG

Goal Alignment in LLM-Based User Simulators for Conversational AI

This paper introduces User Goal State Tracking (UGST), a novel framework and three-stage methodology that enables LLM-based user simulators to autonomously track goal progression and generate goal-aligned responses, significantly improving performance on MultiWOZ 2.4 and $\tau$ -Bench benchmarks.

Shuhaib Mehri, Xiaocheng Yang, Takyoung Kim, Gokhan Tur, Shikib Mehri, Dilek Hakkani-Tür2026-03-10💬 cs.CL

CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data

The paper introduces CauKer, a novel algorithm that combines Gaussian Process kernel composition with Structural Causal Models to generate diverse, causally coherent synthetic time series, enabling sample-efficient pre-training of classification foundation models that exhibit clear scaling laws across varying dataset sizes and model capacities.

Shifeng Xie, Vasilii Feofanov, Ambroise Odonnat, Lei Zan, Marius Alonso, Jianfeng Zhang, Themis Palpanas, Lujia Pan, Keli Zhang, Ievgen Redko2026-03-10🤖 cs.LG

GraphProp: Training the Graph Foundation Models using Graph Properties

GraphProp is a two-phase framework for training graph foundation models that first learns structural generalization by predicting graph invariants and then leverages these representations as positional encodings to enhance cross-domain performance in graph-level tasks, particularly outperforming existing methods in scenarios with limited data or missing node attributes.

Ziheng Sun, Qi Feng, Lehao Lin, Chris Ding, Jicong Fan2026-03-10🤖 cs.LG

Video-EM: Event-Centric Episodic Memory for Long-Form Video Understanding

Video-EM introduces a training-free, event-centric episodic memory framework that enhances long-form video understanding by orchestrating an LLM to localize, segment, and refine query-relevant moments into a compact, temporally coherent event timeline, thereby overcoming the context limitations of existing Video-LLMs without requiring architectural changes.

Yun Wang, Long Zhang, Jingren Liu, Jiaqi Yan, Zhanjie Zhang, Jiahao Zheng, Ao Ma, Run Ling, Xun Yang, Dapeng Wu, Xiangyu Chen, Xuelong Li2026-03-10💻 cs

UniCast: A Unified Framework for Instance-Conditioned Multimodal Time-Series Forecasting

UniCast is a parameter-efficient framework that enhances Time Series Foundation Models through instance-conditioned prompting and dynamic modality routing, enabling effective adaptation to multimodal inputs and instance-level variations without updating the frozen forecasting backbone.

Sehyuk Park, Soyeon Caren Han, Eduard Hovy2026-03-10💻 cs

ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals

The paper introduces ECHO, a novel foundation model that leverages band-split architecture and frequency positional embeddings to achieve state-of-the-art performance in anomaly detection and fault classification across variable-length, arbitrary sampling rate machine signals without requiring padding or cropping.

Yucong Zhang, Juan Liu, Ming Li2026-03-10🤖 cs.LG

Entropy-Driven Curriculum for Multi-Task Training in Human Mobility Prediction

This paper proposes a unified training framework that combines entropy-driven curriculum learning, which sequences training from simple to complex trajectories based on Lempel-Ziv compression, with multi-task learning to simultaneously optimize location, distance, and direction predictions, thereby achieving state-of-the-art performance and significantly faster convergence in human mobility prediction.

Tianye Fang, Xuanshu Luo, Martin Werner2026-03-10🤖 cs.LG

Improving the Resilience of Quadrotors in Underground Environments by Combining Learning-based and Safety Controllers

This paper proposes a hybrid control framework that enhances quadrotor resilience in underground environments by using a normalizing flow-based prior as a runtime monitor to dynamically switch between a learning-based controller for efficiency and a safety controller for collision avoidance when encountering out-of-distribution scenarios.

Isaac Ronald Ward, Mark Paral, Kristopher Riordan + 1 more2026-03-10⚡ eess

OTESGN: Optimal Transport-Enhanced Syntactic-Semantic Graph Networks for Aspect-Based Sentiment Analysis

The paper proposes OTESGN, a novel aspect-based sentiment analysis model that integrates syntactic graph attention with semantic optimal transport to effectively capture nonlinear associations and suppress noise, achieving state-of-the-art performance on multiple benchmark datasets.

Xinfeng Liao, Xuanqi Chen, Lianxi Wang, Jiahuan Yang, Zhuowei Chen, Ziying Rong2026-03-10💬 cs.CL

Classification of Driver Behaviour Using External Observation Techniques for Autonomous Vehicles

This study presents a novel computer vision framework that utilizes external observation techniques, including YOLO-based object detection and lane monitoring, to classify distracted and impaired driver behaviors in non-connected vehicles without relying on inter-vehicular communication.

Ian Nell, Shane Gilroy2026-03-10⚡ eess

Synthetic Homes: An Accessible Multimodal Pipeline for Producing Residential Building Data with Generative AI

This paper introduces a modular, multimodal framework that leverages generative AI to synthesize realistic residential building data from public images and information, thereby overcoming data accessibility and privacy barriers to advance energy modeling and machine learning research.

Jackson Eshbaugh, Chetan Tiwari, Jorge Silveyra2026-03-10🤖 cs.LG

MICA: Multi-Agent Industrial Coordination Assistant

This paper introduces MICA, a privacy-preserving, speech-interactive multi-agent system that leverages Adaptive Step Fusion and a safety-audited coordination topology to deliver robust, real-time industrial assistance for assembly and maintenance tasks on resource-constrained hardware.

Di Wen, Kunyu Peng, Junwei Zheng, Yufan Chen, Yitian Shi, Jiale Wei, Ruiping Liu, Kailun Yang, Rainer Stiefelhagen2026-03-10🤖 cs.LG

← Previous Next →