Guiding Diffusion-based Reconstruction with Contrastive Signals for Balanced Visual Representation
This paper proposes Diffusion Contrastive Reconstruction (DCR), a method that injects contrastive signals derived from reconstructed images into the diffusion process to resolve gradient conflicts and jointly optimize both discriminative and detail-perceptive abilities, thereby overcoming the limitations of CLIP's visual encoder for balanced visual representation.