cs.CV papers | Gist.Science

UD-SfPNet: An Underwater Descattering Shape-from-Polarization Network for 3D Normal Reconstruction

This paper proposes UD-SfPNet, a unified deep learning framework that jointly performs underwater image descattering and shape-from-polarization 3D reconstruction to significantly improve surface normal estimation accuracy in scattering environments.

Puyun Wang, Kaimin Yu, Huayang He + 3 more2026-03-03💻 cs

On the Exact Algorithmic Extraction of Finite Tesselations Through Prime Extraction of Minimal Representative Forms

This paper presents a hierarchical deterministic algorithm that extracts exact axis-aligned rectangular tessellations from finite planar grids by combining composite discovery, minimal representative form normalization, and prime extraction to address the limitations of existing symbolic pattern recognition methods.

Sushish Baral, Paulo Garcia, Warisa Sritriratanarak2026-03-03💻 cs

VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection

VGGT-Det is a novel framework for sensor-geometry-free multi-view indoor 3D object detection that integrates a Visual Geometry Grounded Transformer (VGGT) encoder with attention-guided query generation and query-driven feature aggregation to effectively leverage internal semantic and geometric priors, achieving state-of-the-art performance on ScanNet and ARKitScenes without requiring calibrated camera poses.

Yang Cao, Feize Wu, Dave Zhenyu Chen + 3 more2026-03-03💻 cs

DriveCode: Domain Specific Numerical Encoding for LLM-Based Autonomous Driving

This paper introduces DriveCode, a novel numerical encoding method that represents numbers as dedicated embeddings instead of discrete tokens to overcome precision and efficiency limitations, thereby significantly improving trajectory prediction and control signal generation in LLM-based autonomous driving systems.

Zhiye Wang, Yanbo Jiang, Rui Zhou + 5 more2026-03-03💻 cs

The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors

This paper demonstrates that current vision-language models significantly underperform when analyzing handwritten student work, particularly in identifying and diagnosing errors made by struggling learners, highlighting a critical gap between their problem-solving capabilities and the specific needs of educational applications.

Li Lucy, Albert Zhang, Nathan Anderson + 2 more2026-03-03💬 cs.CL

Seeing Beyond 8bits: Subjective and Objective Quality Assessment of HDR-UGC Videos

This paper addresses the limitations of existing Standard Dynamic Range (SDR) models in assessing High Dynamic Range (HDR) user-generated content by introducing "Beyond8Bits," a large-scale subjective dataset, and "HDR-Q," a state-of-the-art Multimodal Large Language Model equipped with an HDR-aware vision encoder and a novel reinforcement learning framework to achieve superior quality assessment.

Shreshth Saini, Bowen Chen, Neil Birkbeck + 3 more2026-03-03🤖 cs.AI

StegoNGP: 3D Cryptographic Steganography using Instant-NGP

The paper proposes StegoNGP, a parameter-free 3D cryptographic steganography method that leverages Instant-NGP's hash encoding as a key-controlled mechanism to securely embed a complete hidden 3D scene within a single neural network model indistinguishable from a standard cover scene, while offering high capacity, imperceptibility, and robustness through an enhanced multi-key scheme.

Wenxiang Jiang, Yujun Lan, Shuo Zhao + 3 more2026-03-03💻 cs

When Does Margin Clamping Affect Training Variance? Dataset-Dependent Effects in Contrastive Forward-Forward Learning

This paper demonstrates that the saturating similarity clamping used in Contrastive Forward-Forward learning significantly increases training variance on datasets like CIFAR-10 due to gradient truncation at early layers, a dataset-dependent effect that can be eliminated by switching to a gradient-neutral margin subtraction formulation without compromising mean accuracy.

Joshua Steier2026-03-03🤖 cs.LG

Decoupling Motion and Geometry in 4D Gaussian Splatting

This paper introduces VeGaS, a novel 4D Gaussian Splatting framework that decouples motion and geometry by employing a Galilean shearing matrix for time-varying velocity and a Geometric Deformation Network to achieve state-of-the-art high-fidelity dynamic scene reconstruction.

Yi Zhang, Yulei Kang, Jian-Fang Hu2026-03-03💻 cs

EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization

EraseAnything++ is a unified framework that enables effective concept erasure in rectified flow-based image and video diffusion models by formulating the task as a constrained multi-objective optimization problem and employing implicit gradient surgery, LoRA-based tuning, and an anchor-and-propagate mechanism to balance removal efficacy with generative quality and temporal consistency.

Zhaoxin Fan, Nanxiang Jiang, Daiheng Gao + 2 more2026-03-03🤖 cs.AI

Fake It Right: Injecting Anatomical Logic into Synthetic Supervised Pre-training for Medical Segmentation

This paper proposes an Anatomy-Informed Synthetic Supervised Pre-training framework that bridges the semantic gap in formula-driven learning by replacing generic primitives with de-identified anatomical masks and a structure-aware placement strategy, thereby achieving superior performance and scalability in 3D medical segmentation while ensuring privacy compliance.

Jiaqi Tang, Mengyan Zheng, Shu Zhang + 2 more2026-03-03💻 cs

Event-Anchored Frame Selection for Effective Long-Video Understanding

This paper proposes Event-Anchored Frame Selection (EFS), a training-free, hierarchical framework that partitions videos into semantic events and selects query-relevant anchor frames to optimize keyframe diversity and coverage, significantly enhancing the long-video understanding capabilities of large vision-language models.

Wang Chen, Yongdong Luo, Yuhui Zeng + 5 more2026-03-03💻 cs

The Texture-Shape Dilemma: Boundary-Safe Synthetic Generation for 3D Medical Transformers

This paper addresses the limitations of existing formula-driven synthetic data by proposing a Physics-inspired Spatially-Decoupled Synthesis framework that resolves the texture-shape conflict through a gradient-shielded buffer zone and spectral texture injection, thereby significantly enhancing the performance of 3D medical Vision Transformers on BTCV and MSD datasets without relying on real patient data.

Jiaqi Tang, Weixuan Xu, Shu Zhang + 2 more2026-03-03💻 cs

Foundation Models in Remote Sensing: Evolving from Unimodality to Multimodality

This paper presents a comprehensive technical survey on foundation models in remote sensing, exploring their evolution from unimodal to multimodal approaches while providing a tutorial-like guide to help researchers understand, train, and apply these models to real-world tasks.

Danfeng Hong, Chenyu Li, Xuyang Li + 2 more2026-03-03💻 cs

MLRecon: Robust Markerless Freehand 3D Ultrasound Reconstruction via Coarse-to-Fine Pose Estimation

MLRecon is a robust, low-cost framework for markerless freehand 3D ultrasound reconstruction that utilizes a commodity RGB-D camera and a vision foundation model-based pipeline with a dual-stage refinement network to achieve drift-resilient, sub-millimeter accurate probe pose tracking and high-quality volumetric imaging.

Yi Zhang, Puxun Tu, Kun Wang + 3 more2026-03-03💻 cs

GeodesicNVS: Probability Density Geodesic Flow Matching for Novel View Synthesis

The paper proposes GeodesicNVS, a novel view synthesis framework that combines deterministic Data-to-Data Flow Matching with Probability Density Geodesic constraints derived from pretrained diffusion models to achieve superior view consistency and geometric coherence compared to traditional diffusion-based approaches.

Xuqin Wang, Tao Wu, Yanfeng Zhang + 5 more2026-03-03💻 cs

Implementation of Licensed Plate Detection and Noise Removal in Image Processing

This paper discusses the implementation of car license plate recognition systems, highlighting their growing necessity in Malaysia due to increasing vehicle numbers and their diverse applications in traffic management, law enforcement, and other specialized fields.

Yiquan Gao2026-03-03⚡ eess

RaUF: Learning the Spatial Uncertainty Field of Radar

This paper proposes RaUF, a spatial uncertainty field learning framework that addresses the low fidelity and ambiguity of millimeter-wave radar by modeling anisotropic probabilistic uncertainty and employing a bidirectional domain attention mechanism to suppress spurious returns, thereby delivering highly reliable spatial detections with well-calibrated uncertainty for downstream perception tasks.

Shengpeng Wang, Kuangyu Wang, Wei Wang2026-03-03💻 cs

Content-Aware Frequency Encoding for Implicit Neural Representations with Fourier-Chebyshev Features

This paper proposes Content-Aware Frequency Encoding (CAFE) and its enhanced variant CAFE+, which utilize learnable parallel linear layers and Chebyshev features to overcome the spectral bias of Implicit Neural Representations by explicitly synthesizing and selecting task-relevant frequency bases for superior signal reconstruction.

Junbo Ke, Yangyang Xu, You-Wei Wen + 1 more2026-03-03🤖 cs.AI

Vision-Language Feature Alignment for Road Anomaly Segmentation

The paper proposes VL-Anomaly, a novel framework that leverages vision-language model priors and a multi-source inference strategy to significantly improve road anomaly segmentation by reducing false positives on normal backgrounds and enhancing the detection of out-of-distribution obstacles.

Zhuolin He, Jiacheng Tang, Jian Pu + 1 more2026-03-03💻 cs

← Previous Next →