cs.CV papers | Gist.Science

Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection

This paper introduces the Coherence-Preserving Semantic Injection (CSI) attack, which leverages large language models to generate targeted, globally coherent semantic perturbations that effectively bypass content-aware semantic watermarks in generative images, thereby exposing a critical vulnerability in current provenance tracking designs.

Zheng Gao, Xiaoyu Li, Zhicheng Bao + 2 more2026-02-26🤖 cs.LG

A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers

This paper reveals a hidden semantic bottleneck in Diffusion Transformers, demonstrating that class and continuous conditional embeddings exhibit extreme redundancy and that pruning up to two-thirds of the embedding space preserves or even improves generation quality.

Trung X. Pham, Kang Zhang, Ji Woo Hong + 1 more2026-02-26💻 cs

Virtual Biopsy for Intracranial Tumors Diagnosis on MRI

To overcome the risks of invasive biopsies and data scarcity in diagnosing deep intracranial tumors, this paper introduces the first public biopsy-verified MRI benchmark (ICT-MRI) and a "Virtual Biopsy" framework that leverages vision-language models and adaptive attention mechanisms to achieve over 90% diagnostic accuracy.

Xinzhe Luo, Shuai Shao, Yan Wang + 3 more2026-02-26🤖 cs.AI

UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling

UniHand presents a unified diffusion-based framework that integrates heterogeneous inputs via a shared latent space to simultaneously address hand motion estimation and generation, thereby enabling robust and accurate 4D hand motion modeling even under severe occlusions and incomplete sequences.

Zhihao Sun, Tong Wu, Ruirui Tu + 2 more2026-02-26💻 cs

Self-Correcting VLA: Online Action Refinement via Sparse World Imagination

This paper proposes Self-Correcting VLA (SC-VLA), a novel framework that enhances robot manipulation performance by integrating sparse world imagination for intrinsic progress forecasting with an online action refinement module, achieving state-of-the-art results with higher success rates and fewer steps compared to existing baselines.

Chenyv Liu, Wentao Tan, Lei Zhu + 4 more2026-02-26🤖 cs.AI

Axial-Centric Cross-Plane Attention for 3D Medical Image Classification

This paper proposes an axial-centric cross-plane attention architecture that leverages a frozen MedDINOv3 foundation model and directional transformer encoders to align 3D medical image classification with clinical workflows, demonstrating superior performance over existing methods by prioritizing the axial plane while integrating complementary coronal and sagittal information.

Doyoung Park, Jinsoo Kim, Lohendran Baskaran2026-02-26💻 cs

Lie Flow: Video Dynamic Fields Modeling and Predicting with Lie Algebra as Geometric Physics Principle

LieFlow introduces a dynamic radiance representation framework that leverages SE(3) Lie algebra to enforce physically consistent, unified modeling of translation and rotation, thereby significantly improving view synthesis fidelity, temporal coherence, and physical realism in 4D scene reconstruction compared to existing NeRF-based approaches.

Weidong Qiao, Wangmeng Zuo, Hui Li2026-02-26💻 cs

Following the Diagnostic Trace: Visual Cognition-guided Cooperative Network for Chest X-Ray Diagnosis

This paper proposes the Visual Cognition-guided Cooperative Network (VCC-Net), a system that integrates radiologists' visual search traces via interactive tools with model inference to create a transparent, collaborative diagnostic framework that significantly improves chest X-ray classification accuracy and interpretability.

Shaoxuan Wu, Jingkun Chen, Chong Ma + 3 more2026-02-26🤖 cs.AI

HybridINR-PCGC: Hybrid Lossless Point Cloud Geometry Compression Bridging Pretrained Model and Implicit Neural Representation

HybridINR-PCGC is a novel point cloud geometry compression framework that bridges pretrained models and implicit neural representations by utilizing a Pretrained Prior Network to accelerate the convergence of a Distribution Agnostic Refiner, thereby achieving superior compression rates and encoding efficiency while mitigating the limitations of data dependency and bitstream overhead.

Wenjie Huang, Qi Yang, Shuting Xia + 3 more2026-02-26💻 cs

Space-Time Forecasting of Dynamic Scenes with Motion-aware Gaussian Grouping

This paper introduces MoGaF, a framework that leverages motion-aware Gaussian grouping and group-wise optimization within a 4D Gaussian Splatting representation to achieve physically consistent, spatially coherent, and temporally stable long-term forecasting of dynamic scenes.

Junmyeong Lee, Hoseung Choi, Minsu Cho2026-02-26💻 cs

E-comIQ-ZH: A Human-Aligned Dataset and Benchmark for Fine-Grained Evaluation of E-commerce Posters with Chain-of-Thought

This paper introduces E-comIQ-ZH, a comprehensive framework comprising the E-comIQ-18k dataset with expert-calibrated Chain-of-Thought rationales, the E-comIQ-M evaluation model, and the E-comIQ-Bench benchmark, designed to provide the first automated, human-aligned, and fine-grained assessment of Chinese e-commerce posters.

Meiqi Sun, Mingyu Li, Junxiong Zhu2026-02-26💻 cs

SF3D-RGB: Scene Flow Estimation from Monocular Camera and Sparse LiDAR

The paper introduces SF3D-RGB, an end-to-end deep learning architecture that fuses monocular RGB images and sparse LiDAR point clouds to achieve accurate and efficient scene flow estimation, outperforming single-modality methods and other fusion approaches with fewer parameters.

Rajai Alhimdiat, Ramy Battrawy, René Schuster + 2 more2026-02-26💻 cs

Brain Tumor Segmentation with Special Emphasis on the Non-Enhancing Brain Tumor Compartment

This paper presents a U-Net-based deep learning architecture designed to segment brain tumors across various MRI modalities, with a specific focus on automatically delineating the clinically significant non-enhancing tumor compartment that has been overlooked in recent challenges.

T. Schaffer, A. Brawanski, S. Wein + 2 more2026-02-26🤖 cs.LG

Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

This paper proposes Dynamic Multimodal Activation Steering, a training-free method that mitigates hallucinations in Large Vision-Language Models by dynamically selecting and applying context-aware steering vectors to specific attention heads based on semantic similarity.

Jianghao Yin, Qin Chen, Kedi Chen + 3 more2026-02-26🤖 cs.AI

SurGo-R1: Benchmarking and Modeling Contextual Reasoning for Operative Zone in Surgical Video

This paper introduces SurGo-R1, a reinforcement learning-optimized model and accompanying benchmark that significantly outperforms generalist vision-language models in identifying safe operative zones in surgical videos by explicitly integrating phase-dependent contextual reasoning.

Guanyi Qin, Xiaozhen Wang, Zhu Zhuo + 7 more2026-02-26🤖 cs.AI

Learning spatially adaptive sparsity level maps for arbitrary convolutional dictionaries

This paper presents an enhanced image reconstruction method that embeds neural network-inferred spatially adaptive sparsity maps into a model-based convolutional dictionary framework, achieving filter-permutation invariance, inference-time dictionary flexibility, and improved robustness to data distribution shifts compared to purely black-box deep learning approaches.

Joshua Schulz, David Schote, Christoph Kolbitsch + 2 more2026-02-26⚡ eess

Assessing airborne laser scanning and aerial photogrammetry for deep learning-based stand delineation

This study demonstrates that deep learning-based forest stand delineation achieves comparable accuracy using temporally aligned digital photogrammetry-derived canopy height models and digital terrain models as it does with airborne laser scanning data, suggesting that large-scale, consistent datasets can be assembled without relying on the more complex and temporally misaligned ALS data.

Håkon Næss Sandum, Hans Ole Ørka, Oliver Tomic + 1 more2026-02-26💻 cs

Innovative Tooth Segmentation Using Hierarchical Features and Bidirectional Sequence Modeling

This paper proposes an innovative tooth segmentation method that combines a three-stage hierarchical encoder with bidirectional sequence modeling to effectively capture multi-scale features and global context while avoiding the high computational cost of traditional transformer-based approaches, achieving superior performance on dental datasets.

Xinxin Zhao, Jian Jiang, Yan Tian + 5 more2026-02-26💻 cs

TranX-Adapter: Bridging Artifacts and Semantics within MLLMs for Robust AI-generated Image Detection

To address the attention dilution caused by high intra-feature similarity in artifact detection, the paper proposes TranX-Adapter, a lightweight fusion module that integrates Task-aware Optimal-Transport and X-Fusion mechanisms to effectively combine semantic and artifact features within MLLMs, significantly boosting AI-generated image detection accuracy.

Wenbin Wang, Yuge Huang, Jianqing Xu + 5 more2026-02-26💻 cs

SigVLP: Sigmoid Volume-Language Pre-Training for Self-Supervised CT-Volume Adaptive Representation Learning

SigVLP introduces a self-supervised vision-language pre-training framework for CT volumes that utilizes Rotary Position Embeddings to handle variable input sizes without information loss and employs chunkwise volume-text alignment for finer-grained, more precise representation learning across diverse downstream medical imaging tasks.

Jiayi Wang, Hadrien Reynaud, Ibrahim Ethem Hamamci + 4 more2026-02-26💻 cs

← Previous Next →