cs.CV papers | Gist.Science

HeroGS: Hierarchical Guidance for Robust 3D Gaussian Splatting under Sparse Views

HeroGS is a unified framework that enhances robust 3D Gaussian Splatting under sparse-view conditions by employing a hierarchical guidance strategy across image, feature, and parameter levels to regularize Gaussian distributions, refine high-frequency details, and ensure geometric consistency, thereby achieving superior reconstruction fidelity compared to state-of-the-art methods.

Jiashu Li, Xumeng Han, Zhaoyang Wei + 5 more2026-03-04💻 cs

Continuous Exposure-Time Modeling for Realistic Atmospheric Turbulence Synthesis

This paper introduces ET-Turb, a large-scale synthetic dataset and a novel exposure-time-dependent modulation transfer function (ET-MTF) framework that models atmospheric turbulence blur as a continuous function of exposure time, thereby enabling more realistic turbulence synthesis and significantly improving the generalization of vision models on real-world data compared to existing methods.

Junwei Zeng, Dong Liang, Sheng-Jun Huang + 2 more2026-03-04💻 cs

UETrack: A Unified and Efficient Framework for Single Object Tracking

UETrack is a unified and efficient single object tracking framework that leverages a Token-Pooling-based Mixture-of-Experts mechanism and Target-aware Adaptive Distillation to achieve superior speed-accuracy trade-offs across multiple modalities and hardware platforms.

Ben Kang, Jie Zhao, Xin Chen + 5 more2026-03-04💻 cs

FACE: A Face-based Autoregressive Representation for High-Fidelity and Efficient Mesh Generation

FACE introduces a novel face-level autoregressive framework that treats each triangle as a single token to drastically reduce sequence length and computational costs while achieving state-of-the-art reconstruction quality and enabling efficient high-fidelity 3D mesh generation.

Hanxiao Wang, Yuan-Chen Guo, Ying-Tian Liu + 6 more2026-03-04💻 cs

InterCoG: Towards Spatially Precise Image Editing with Interleaved Chain-of-Grounding Reasoning

This paper presents InterCoG, a novel text-vision interleaved chain-of-grounding reasoning framework that enhances fine-grained image editing in complex multi-entity scenes by explicitly deducing target locations through text-based spatial reasoning before performing visual grounding and outcome specification, supported by a new dataset and auxiliary training modules to ensure spatial precision.

Yecong Wan, Fan Li, Chunwei Wang + 3 more2026-03-04💻 cs

What Helps---and What Hurts: Bidirectional Explanations for Vision Transformers

This paper introduces BiCAM, a bidirectional class activation mapping method that captures both supportive and suppressive contributions in Vision Transformers to enhance explanation faithfulness and enable efficient adversarial detection without retraining.

Qin Su, Tie Luo2026-03-04🤖 cs.AI

PromptStereo: Zero-Shot Stereo Matching via Structure and Motion Prompts

This paper introduces PromptStereo, a zero-shot stereo matching method that enhances the iterative refinement stage by integrating monocular structure and stereo motion cues as prompts into a Prompt Recurrent Unit (PRU), thereby achieving state-of-the-art generalization performance while preserving inherent monocular depth priors.

Xianqi Wang, Hao Yang, Hangtian Wang + 4 more2026-03-04💻 cs

Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy

The paper introduces Nano-EmoX, a compact 2.2B-parameter multimodal language model trained via the Perception-to-Empathy (P2E) curriculum framework, which unifies six core affective tasks across a three-level cognitive hierarchy to achieve state-of-the-art performance in emotional intelligence from low-level perception to high-level empathy.

Jiahao Huang, Fengyan Lin, Xuechao Yang + 4 more2026-03-04🤖 cs.AI

SimRecon: SimReady Compositional Scene Reconstruction from Real Videos

SimRecon is a novel framework that achieves high-fidelity, physically plausible compositional scene reconstruction from real videos by integrating a "Perception-Generation-Simulation" pipeline with two specialized bridging modules: Active Viewpoint Optimization for visual fidelity and a Scene Graph Synthesizer for physical plausibility.

Chong Xia, Kai Zhu, Zizhuo Wang + 3 more2026-03-04💻 cs

OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution

This paper introduces OnlineX, a feed-forward framework that achieves unified online 3D reconstruction and semantic understanding by employing a decoupled active-to-stable state evolution paradigm to resolve cumulative drift while jointly modeling visual and language fields for real-time, high-fidelity performance.

Chong Xia, Fangfu Liu, Yule Wang + 2 more2026-03-04💻 cs

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

The paper proposes HiFi-Inpaint, a novel framework that utilizes Shared Enhancement Attention and a Detail-Aware Loss to overcome data and supervision limitations, achieving state-of-the-art, high-fidelity generation of detail-preserving human-product images.

Yichen Liu, Donghao Zhou, Jie Wang + 9 more2026-03-04💻 cs

Forecasting as Rendering: A 2D Gaussian Splatting Framework for Time Series Forecasting

This paper introduces TimeGS, a novel time series forecasting framework that reframes prediction as 2D generative rendering by leveraging adaptive Gaussian kernels and a chronologically continuous rasterization mechanism to overcome the topological mismatches and resolution inefficiencies of existing 2D reshaping methods, thereby achieving state-of-the-art performance.

Yixin Wang, Yifan Hu, Peiyuan Liu + 3 more2026-03-04🤖 cs.AI

CamDirector: Towards Long-Term Coherent Video Trajectory Editing

CamDirector is a novel video trajectory editing framework that achieves long-term temporal coherence and precise camera control by combining a hybrid warping scheme with a world cache and a history-guided autoregressive diffusion model, validated by a new benchmark called iPhone-PTZ.

Zhihao Shi, Kejia Yin, Weilin Wan + 5 more2026-03-04💻 cs

Social-JEPA: Emergent Geometric Isomorphism

This paper demonstrates that independent agents trained with predictive learning objectives on distinct viewpoints of the same environment naturally develop geometrically isomorphic latent spaces, enabling zero-shot knowledge transfer and efficient interoperability without parameter sharing or coordination.

Haoran Zhang, Youjin Wang, Yi Duan + 6 more2026-03-04🤖 cs.AI

From Visual to Multimodal: Systematic Ablation of Encoders and Fusion Strategies in Animal Identification

This study presents a multimodal animal identification framework that leverages a massive dataset of 1.9 million images and synthetic textual descriptions to achieve an 84.28% Top-1 accuracy, representing an 11% improvement over unimodal baselines through systematic ablation of encoders and an optimal gated fusion strategy.

Vasiliy Kudryavtsev, Kirill Borodin, German Berezin + 3 more2026-03-04💻 cs

Beyond Prompt Degradation: Prototype-guided Dual-pool Prompting for Incremental Object Detection

This paper proposes PDP, a novel prompt-decoupled framework for Incremental Object Detection that utilizes a dual-pool prompting paradigm to separate task-general and task-specific knowledge while employing a prototypical pseudo-label generation module to mitigate prompt drift, thereby achieving state-of-the-art performance on MS-COCO and PASCAL VOC benchmarks.

Yaoteng Zhang, Zhou Qing, Junyu Gao + 1 more2026-03-04🤖 cs.AI

AutoFFS: Adversarial Deformations for Facial Feminization Surgery Planning

The paper introduces AutoFFS, a novel data-driven framework that utilizes adversarial free-form deformations to generate quantitative, counterfactual skull morphologies for objective and reproducible preoperative planning in Facial Feminization Surgery.

Paul Friedrich, Florentin Bieder, Florian M. Thieringer + 1 more2026-03-04⚡ eess

Loss Design and Architecture Selection for Long-Tailed Multi-Label Chest X-Ray Classification

This paper presents a systematic evaluation of loss functions, architectures, and post-training strategies for long-tailed multi-label chest X-ray classification on the CXR-LT 2026 benchmark, demonstrating that LDAM-DRW combined with a ConvNeXt-Large backbone and classifier re-training achieves a top-5 ranking with 0.3950 mAP while offering practical insights into the development-to-test performance gap.

Nikhileswara Rao Sulake2026-03-04⚡ eess

HAMMER: Harnessing MLLM via Cross-Modal Integration for Intention-Driven 3D Affordance Grounding

HAMMER is a novel framework that leverages multimodal large language models to achieve intention-driven 3D affordance grounding by aggregating interaction intentions into contact-aware embeddings and employing hierarchical cross-modal integration with multi-granular geometry lifting for accurate 3D localization.

Lei Yao, Yong Chen, Yuejiao Su + 3 more2026-03-04💻 cs

Preconditioned Score and Flow Matching

This paper identifies that the ill-conditioned covariance of intermediate distributions in flow matching and score-based diffusion causes optimization bias and stagnation, and proposes reversible preconditioning maps to reshape this geometry, thereby enabling continued progress along suppressed directions and yielding better-trained models.

Shadab Ahamed, Eshed Gal, Simon Ghyselincks + 3 more2026-03-04🤖 cs.AI

← Previous Next →