cs.CV papers | Gist.Science

OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models

This paper introduces OddGridBench, a benchmark revealing that current multimodal large language models significantly underperform humans in detecting fine-grained visual discrepancies, and proposes OddGrid-GRPO, a reinforcement learning framework that effectively enhances this sensitivity through curriculum learning and distance-aware rewards.

Tengjin Weng, Wenhao Jiang, Jingyi Wang, Ming Li, Lin Ma, Zhong Ming2026-03-11💻 cs

Beyond Scaling: Assessing Strategic Reasoning and Rapid Decision-Making Capability of LLMs in Zero-sum Environments

This paper introduces the Strategic Tactical Agent Reasoning (STAR) benchmark, a multi-agent framework for evaluating LLMs in zero-sum environments, which reveals a critical trade-off where reasoning-intensive models excel in turn-based settings but often underperform in real-time scenarios due to latency, highlighting the need to balance strategic depth with rapid execution.

Yang Li, Xing Chen, Yutao Liu, Gege Qi, Yanxian BI, Zizhe Wang, Yunjian Zhang, Yao Zhu2026-03-11🤖 cs.AI

Predictive Spectral Calibration for Source-Free Test-Time Regression

This paper proposes Predictive Spectral Calibration (PSC), a simple and model-agnostic source-free framework that enhances test-time adaptation for image regression by extending subspace alignment to block spectral matching, thereby achieving consistent performance improvements over strong baselines, especially under severe distribution shifts.

Nguyen Viet Tuan Kiet, Huynh Thanh Trung, Pham Huy Hieu2026-03-11💻 cs

Robust Provably Secure Image Steganography via Latent Iterative Optimization

This paper proposes a robust and provably secure image steganography framework that utilizes latent-space iterative optimization to significantly enhance message extraction accuracy under various compression and processing scenarios while maintaining security guarantees.

Yanan Li, Zixuan Wang, Qiyang Xiao, Yanzhen Ren2026-03-11💻 cs

Evidential Perfusion Physics-Informed Neural Networks with Residual Uncertainty Quantification

This paper introduces Evidential Perfusion Physics-Informed Neural Networks (EPPINN), a novel framework that integrates evidential deep learning with physics-informed modeling to quantify both aleatoric and epistemic uncertainties in CT perfusion imaging, thereby achieving superior accuracy and reliability in acute ischemic stroke assessment compared to existing deterministic methods.

Junhyeok Lee, Minseo Choi, Han Jang, Young Hun Jeon, Heeseong Eum, Joon Jang, Chul-Ho Sohn, Kyu Sung Choi2026-03-11💻 cs

M3GCLR: Multi-View Mini-Max Infinite Skeleton-Data Game Contrastive Learning For Skeleton-Based Action Recognition

This paper proposes M3GCLR, a game-theoretic contrastive learning framework that addresses limitations in existing skeleton-based action recognition methods by establishing an Infinite Skeleton-data Game model with a mini-max optimization strategy and dual-loss equilibrium optimizer to effectively handle view discrepancies, adversarial mechanisms, and augmentation perturbations, achieving state-of-the-art performance on multiple benchmarks.

Yanshan Li, Ke Ma, Miaomiao Wei, Linhui Dai2026-03-11🤖 cs.AI

MIL-PF: Multiple Instance Learning on Precomputed Features for Mammography Classification

The paper proposes MIL-PF, a scalable framework that combines frozen foundation model encoders with a lightweight attention-based Multiple Instance Learning head to achieve state-of-the-art mammography classification while significantly reducing computational costs and training complexity.

Nikola Jovišic, Milica Škipina, Nicola Dall'Asen, Dubravko Culibrk2026-03-11🤖 cs.AI

SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization

SinGeo is a novel framework that achieves robust cross-view geo-localization using a single model by employing a dual discriminative learning architecture and a curriculum learning strategy, thereby overcoming the limitations of existing methods that struggle with unseen fields of view and orientations.

Yang Chen, Xieyuanli Chen, Junxiang Li, Jie Tang, Tao Wu2026-03-11💻 cs

EventVGGT: Exploring Cross-Modal Distillation for Consistent Event-based Depth Estimation

EventVGGT is a novel framework that addresses the scarcity of depth annotations and temporal inconsistency in event-based monocular depth estimation by treating event streams as coherent video sequences and distilling spatio-temporal and multi-view geometric priors from the Visual Geometry Grounded Transformer (VGGT) through a tri-level distillation strategy, achieving state-of-the-art performance and robust zero-shot generalization.

Yinrui Ren, Jinjing Zhu, Kanghao Chen, Zhuoxiao Li, Jing Ou, Zidong Cao, Tongyan Hua, Peilun Shi, Yingchun Fu, Wufan Zhao, Hui Xiong2026-03-11💻 cs

Training-Free Coverless Multi-Image Steganography with Access Control

The paper proposes MIDAS, a training-free diffusion-based framework that enables coverless multi-image steganography with user-specific access control through latent-level fusion, demonstrating superior performance in image quality, robustness, and security compared to existing methods.

Minyeol Bae, Si-Hyeon Lee2026-03-11💻 cs

ICDAR 2025 Competition on End-to-End Document Image Machine Translation Towards Complex Layouts

This paper presents the ICDAR 2025 competition on end-to-end document image machine translation, detailing its dual-track structure for small and large models, participation statistics, and findings that highlight large-model approaches as a promising paradigm for handling complex document layouts.

Yaping Zhang, Yupu Liang, Zhiyang Zhang, Zhiyuan Chen, Lu Xiang, Yang Zhao, Yu Zhou, Chengqing Zong2026-03-11🤖 cs.AI

YOLO-NAS-Bench: A Surrogate Benchmark with Self-Evolving Predictors for YOLO Architecture Search

This paper introduces YOLO-NAS-Bench, the first surrogate benchmark for YOLO-style object detectors, which employs a self-evolving mechanism to iteratively refine a LightGBM predictor, enabling efficient and accurate discovery of high-performing architectures that surpass official YOLO baselines.

Zhe Li, Xiaoyu Ding, Jiaxin Zheng, Yongtao Wang2026-03-11💻 cs

Reviving ConvNeXt for Efficient Convolutional Diffusion Models

This paper introduces the Fully Convolutional Diffusion Model (FCDM), a ConvNeXt-based architecture that achieves competitive generative performance with significantly fewer computational resources and training steps than Transformer-based counterparts, demonstrating that modern convolutional designs remain a highly efficient alternative for scaling diffusion models.

Taesung Kwon, Lorenzo Bianchi, Lennart Wittke, Felix Watine, Fabio Carrara, Jong Chul Ye, Romann Weber, Vinicius Azevedo2026-03-11🤖 cs.AI

RiO-DETR: DETR for Real-time Oriented Object Detection

RiO-DETR is the first real-time oriented object detection transformer that addresses challenges in angle estimation, periodicity, and convergence through novel designs like Content-Driven Angle Estimation and Decoupled Periodic Refinement, achieving a new speed-accuracy trade-off on benchmark datasets.

Zhangchi Hu, Yifan Zhao, Yansong Peng, Wenzhang Sun, Xiangchen Yin, Jie Chen, Peixi Wu, Hebei Li, Xinghao Wang, Dongsheng Jiang, Xiaoyan Sun2026-03-11💻 cs

PromptDLA: A Domain-aware Prompt Document Layout Analysis Framework with Descriptive Knowledge as a Cue

This paper introduces PromptDLA, a domain-aware framework that leverages descriptive knowledge as cues to customize prompts for integrating domain priors, thereby overcoming the limitations of directly merging diverse datasets and achieving state-of-the-art performance in Document Layout Analysis across multiple benchmarks.

Zirui Zhang, Yaping Zhang, Lu Xiang, Yang Zhao, Feifei Zhai, Yu Zhou, Chengqing Zong2026-03-11🤖 cs.AI

CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation

CIGPose introduces a Causal Intervention Graph Neural Network framework that enhances whole-body pose estimation robustness by using a Structural Causal Model to identify and replace context-confounded keypoint representations with invariant embeddings, thereby achieving state-of-the-art performance on COCO-WholeBody without relying on extra training data.

Bohao Li, Zhicheng Cao, Huixian Li, Yangming Guo2026-03-11💻 cs

MetaDAT: Generalizable Trajectory Prediction via Meta Pre-training and Data-Adaptive Test-Time Updating

The paper proposes MetaDAT, a trajectory prediction framework that combines meta-learning pre-training with a data-adaptive test-time updating mechanism to achieve robust, fast, and accurate online adaptation under distribution shifts by dynamically adjusting learning rates and focusing on informative hard samples.

Yuning Wang, Pu Zhang, Yuan He, Ke Wang, Jianru Xue2026-03-11💻 cs

Open-World Motion Forecasting

This paper introduces "Open-World Motion Forecasting," an end-to-end class-incremental framework that predicts future trajectories directly from camera images while mitigating catastrophic forgetting through pseudo-labeling with vision-language models and a novel query feature variance-based replay strategy, enabling continual adaptation to evolving object taxonomies in real-world autonomous driving.

Nicolas Schischka, Nikhil Gosala, B Ravi Kiran, Senthil Yogamani, Abhinav Valada2026-03-11🤖 cs.AI

GIIM: Graph-based Learning of Inter- and Intra-view Dependencies for Multi-view Medical Image Diagnosis

The paper proposes GIIM, a novel graph-based framework that enhances multi-view medical image diagnosis by simultaneously modeling intra-view relationships and inter-view dynamics while effectively handling missing data to improve predictive accuracy and robustness.

Tran Bao Sam, Hung Vu, Dao Trung Kien, Tran Dat Dang, Van Ha Tang, Steven Truong2026-03-11💻 cs

A Guideline-Aware AI Agent for Zero-Shot Target Volume Auto-Delineation

This paper introduces OncoAgent, a novel guideline-aware AI agent that achieves zero-shot, training-free auto-delineation of clinical target volumes by converting textual clinical guidelines into 3D contours, demonstrating superior adaptability and physician preference over traditional supervised deep learning models.

Yoon Jo Kim, Wonyoung Cho, Jongmin Lee, Han Joo Chae, Hyunki Park, Sang Hoon Seo, Noh Jae Myung, Kyungmi Yang, Dongryul Oh, Jin Sung Kim2026-03-11🤖 cs.AI

← Previous Next →