VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection

VGGT-Det is a novel framework for sensor-geometry-free multi-view indoor 3D object detection that integrates a Visual Geometry Grounded Transformer (VGGT) encoder with attention-guided query generation and query-driven feature aggregation to effectively leverage internal semantic and geometric priors, achieving state-of-the-art performance on ScanNet and ARKitScenes without requiring calibrated camera poses.

Yang Cao, Feize Wu, Dave Zhenyu Chen + 3 more2026-03-03💻 cs

The Aftermath of DrawEduMath: Vision Language Models Underperform with Struggling Students and Misdiagnose Errors

This paper demonstrates that current vision-language models significantly underperform when analyzing handwritten student work, particularly in identifying and diagnosing errors made by struggling learners, highlighting a critical gap between their problem-solving capabilities and the specific needs of educational applications.

Li Lucy, Albert Zhang, Nathan Anderson + 2 more2026-03-03💬 cs.CL

Seeing Beyond 8bits: Subjective and Objective Quality Assessment of HDR-UGC Videos

This paper addresses the limitations of existing Standard Dynamic Range (SDR) models in assessing High Dynamic Range (HDR) user-generated content by introducing "Beyond8Bits," a large-scale subjective dataset, and "HDR-Q," a state-of-the-art Multimodal Large Language Model equipped with an HDR-aware vision encoder and a novel reinforcement learning framework to achieve superior quality assessment.

Shreshth Saini, Bowen Chen, Neil Birkbeck + 3 more2026-03-03🤖 cs.AI

StegoNGP: 3D Cryptographic Steganography using Instant-NGP

The paper proposes StegoNGP, a parameter-free 3D cryptographic steganography method that leverages Instant-NGP's hash encoding as a key-controlled mechanism to securely embed a complete hidden 3D scene within a single neural network model indistinguishable from a standard cover scene, while offering high capacity, imperceptibility, and robustness through an enhanced multi-key scheme.

Wenxiang Jiang, Yujun Lan, Shuo Zhao + 3 more2026-03-03💻 cs

When Does Margin Clamping Affect Training Variance? Dataset-Dependent Effects in Contrastive Forward-Forward Learning

This paper demonstrates that the saturating similarity clamping used in Contrastive Forward-Forward learning significantly increases training variance on datasets like CIFAR-10 due to gradient truncation at early layers, a dataset-dependent effect that can be eliminated by switching to a gradient-neutral margin subtraction formulation without compromising mean accuracy.

Joshua Steier2026-03-03🤖 cs.LG

EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization

EraseAnything++ is a unified framework that enables effective concept erasure in rectified flow-based image and video diffusion models by formulating the task as a constrained multi-objective optimization problem and employing implicit gradient surgery, LoRA-based tuning, and an anchor-and-propagate mechanism to balance removal efficacy with generative quality and temporal consistency.

Zhaoxin Fan, Nanxiang Jiang, Daiheng Gao + 2 more2026-03-03🤖 cs.AI

Fake It Right: Injecting Anatomical Logic into Synthetic Supervised Pre-training for Medical Segmentation

This paper proposes an Anatomy-Informed Synthetic Supervised Pre-training framework that bridges the semantic gap in formula-driven learning by replacing generic primitives with de-identified anatomical masks and a structure-aware placement strategy, thereby achieving superior performance and scalability in 3D medical segmentation while ensuring privacy compliance.

Jiaqi Tang, Mengyan Zheng, Shu Zhang + 2 more2026-03-03💻 cs

The Texture-Shape Dilemma: Boundary-Safe Synthetic Generation for 3D Medical Transformers

This paper addresses the limitations of existing formula-driven synthetic data by proposing a Physics-inspired Spatially-Decoupled Synthesis framework that resolves the texture-shape conflict through a gradient-shielded buffer zone and spectral texture injection, thereby significantly enhancing the performance of 3D medical Vision Transformers on BTCV and MSD datasets without relying on real patient data.

Jiaqi Tang, Weixuan Xu, Shu Zhang + 2 more2026-03-03💻 cs

RaUF: Learning the Spatial Uncertainty Field of Radar

This paper proposes RaUF, a spatial uncertainty field learning framework that addresses the low fidelity and ambiguity of millimeter-wave radar by modeling anisotropic probabilistic uncertainty and employing a bidirectional domain attention mechanism to suppress spurious returns, thereby delivering highly reliable spatial detections with well-calibrated uncertainty for downstream perception tasks.

Shengpeng Wang, Kuangyu Wang, Wei Wang2026-03-03💻 cs