Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
This paper introduces Phys4D, a three-stage training pipeline that transforms appearance-driven video diffusion models into physics-consistent 4D world representations by combining pseudo-supervised pretraining, simulation-grounded fine-tuning, and reinforcement learning to achieve fine-grained spatiotemporal and physical consistency.