CORAL: Correspondence Alignment for Improved Virtual Try-On

This paper introduces CORAL, a DiT-based virtual try-on framework that improves garment detail preservation by explicitly aligning person-garment query-key matching through correspondence distillation and entropy minimization losses, addressing the limitations of existing unpaired methods.

Jiyoung Kim, Youngjin Shin, Siyoon Jin, Dahyun Chung, Jisu Nam, Tongmin Kim, Jongjae Park, Hyeonwoo Kang, Seungryong Kim

Published 2026-02-20
📖 4 min read☕ Coffee break read

Imagine you are trying on a beautiful, intricate sweater in a virtual dressing room. You upload a photo of yourself and a photo of the sweater. The goal is for the computer to "paint" the sweater onto your body perfectly, keeping your pose, your face, and your hands exactly as they are, while making the sweater look like it was actually worn by you.

For a long time, these virtual dressing rooms have been a bit clumsy. They often get the details wrong: the sweater might look stretched, the logo might be blurry, or the computer might accidentally paint the sweater over your face or hands.

Enter CORAL.

Think of CORAL not just as a new computer program, but as a super-precise GPS system for fabric.

Here is how it works, broken down into simple concepts:

1. The Problem: The "Lost in Translation" Garment

In the old methods, the computer was like a painter who had a photo of a sweater and a photo of a person, but they didn't really understand how the two connected.

  • They knew the sweater had a collar, but they weren't sure which part of your neck it should sit on.
  • They knew the sweater had a hem, but they didn't know exactly where your waist was.
  • The Result: The computer would guess. Sometimes it guessed right, but often it would duplicate parts of the sweater (like two hems), stretch the fabric weirdly, or lose the small details like a brand logo.

2. The Insight: The "Secret Handshake"

The researchers behind CORAL discovered something fascinating about the new generation of AI (called Diffusion Transformers). These AIs work by looking at every single pixel of the person and the sweater and asking, "Which pixel of the sweater belongs to which pixel of the person?"

They realized that for the AI to do a good job, it needs a "Secret Handshake" between the person and the garment.

  • Imagine the AI is holding a map. The "Query" is the person's body, and the "Key" is the sweater.
  • If the AI's map is blurry, it might think the sweater's sleeve belongs to your leg.
  • CORAL's job is to sharpen that map so the "handshake" is perfect.

3. The Solution: The "Trusty Guide" (DINOv3)

How does CORAL fix the map? It uses a Trusty Guide (a pre-trained AI model called DINOv3).

  • Think of DINOv3 as an expert tailor who has seen millions of clothes and knows exactly how a sleeve aligns with an arm, or how a collar sits on a neck.
  • CORAL asks this expert tailor: "Hey, look at this person and this sweater. Which parts should match?"
  • The tailor points and says, "This patch of the sweater goes here on the arm, and this logo goes right here on the chest."
  • CORAL then forces the main AI to listen to this expert and adjust its "Secret Handshake" to match the tailor's advice.

4. The Two Magic Tools

To make this work, CORAL uses two specific techniques, which we can think of as The Magnet and The Laser.

  • The Magnet (Correspondence Distillation): This pulls the AI's attention toward the correct spots. If the AI was trying to match a sweater sleeve to a person's knee, the Magnet pulls it back to the person's arm, guided by the expert tailor's map.
  • The Laser (Entropy Minimization): Sometimes, even when the AI knows the right spot, it gets "wishy-washy." It might think, "Maybe the sleeve goes here, or maybe a little bit there?" This makes the image blurry. The Laser forces the AI to be confident. It says, "No, the sleeve goes exactly here, and nowhere else." This creates sharp, crisp edges and clear logos.

5. The Result: A Perfect Fit

When you combine the GPS (CORAL), the Expert Tailor (DINOv3), the Magnet, and the Laser, you get a virtual try-on that feels magical.

  • No more double hems: The bottom of the shirt stops exactly where it should.
  • No more blurry logos: The brand name on the shirt is readable and sharp.
  • No more weird stretching: The fabric looks like it naturally drapes over your specific body shape.

Why Does This Matter?

Before CORAL, virtual try-ons were like wearing a costume that didn't quite fit. With CORAL, it's like having a tailor who can instantly sew a perfect outfit onto your digital self, no matter how you are posing or what the background looks like. It bridges the gap between a flat picture of a shirt and a real, 3D experience of wearing it.

In short: CORAL teaches the computer to stop guessing and start knowing exactly where every piece of fabric belongs.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →