NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
NOVA3R is a feed-forward approach for 3D reconstruction from unposed images that employs a scene-token mechanism and diffusion-based decoder to learn a global, view-agnostic representation, thereby overcoming pixel-aligned limitations to produce complete and physically plausible amodal reconstructions.