Accelerating Video Generation Inference with Sequential-Parallel 3D Positional Encoding Using a Global Time Index
This paper introduces a system-level inference optimization for Diffusion Transformer-based video generation that employs a sequence-parallel Causal-RoPE mechanism and operator fusion to overcome memory and latency bottlenecks, achieving near real-time speeds and sub-second first-frame latency on an eight-GPU cluster.