DREAM: Where Visual Understanding Meets Text-to-Image Generation
DREAM is a unified framework that synergistically combines discriminative and generative objectives through Masking Warmup and Semantically Aligned Decoding, achieving state-of-the-art performance in both visual understanding and text-to-image generation on the CC12M dataset.