Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Uni-CoT introduces a unified Chain-of-Thought framework that leverages a two-level reasoning paradigm and structured training to enable efficient, coherent multimodal reasoning across text and vision, achieving state-of-the-art performance on image generation and editing benchmarks with limited computational resources.