Disentangled Hierarchical VAE for 3D Human-Human Interaction Generation
The paper proposes DHVAE, a disentangled hierarchical variational autoencoder with contrastive learning and DDIM-based diffusion, to generate realistic 3D human-human interactions by explicitly separating global context from individual motion patterns to ensure physical plausibility and semantic alignment.