S2DiT: Sandwich Diffusion Transformer for Mobile Streaming Video Generation
The paper introduces S2DiT, a novel Streaming Sandwich Diffusion Transformer that leverages efficient attention mechanisms, a budget-aware sandwich architecture, and a 2-in-1 distillation framework to achieve high-fidelity, real-time video generation on mobile devices with performance comparable to server-grade models.