ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images

ProFashion is a novel framework for fashion video generation that leverages multiple reference images and introduces a Pose-aware Prototype Aggregator and Flow-enhanced Prototype Instantiator to overcome the limitations of single-image inputs and improve spatiotemporal consistency in generated videos.

Xianghao Kong, Qiaosong Qi, Yuanbin Wang, Biaolong Chen, Aixi Zhang, Anyi Rao

Published 2026-04-01
📖 4 min read☕ Coffee break read

Imagine you are a fashion designer trying to show off a new jacket to a customer. You have a single photo of the jacket from the front. If you try to make a video of a model wearing it and turning around, a standard AI might get confused. It might try to guess what the back looks like, but since it only has the front photo, it might "hallucinate" (make things up), turning the back of the jacket into a weird, blurry mess or copying the front pattern onto the back where it doesn't belong.

This is the problem the paper ProFashion solves. It's a new AI system designed to create fashion videos that look real and consistent, even when the model spins, turns, or moves around.

Here is how it works, explained with simple analogies:

1. The Problem: The "One-Photo" Blind Spot

Think of existing AI video generators like a painter who is only allowed to look at one single photograph of a shirt.

  • The Issue: If the shirt has a cool pattern on the front but a plain color on the back, the painter has no idea what the back looks like. When the model turns around in the video, the AI panics and paints something random.
  • The Result: The video looks glitchy and fake.

2. The Solution: The "Photo Album" Approach

ProFashion changes the rules. Instead of giving the AI just one photo, it gives it a small photo album (multiple reference images) showing the outfit from the front, the back, and the side.

  • The Goal: Now the AI has all the information it needs to know exactly what the outfit looks like from every angle.

3. The Secret Sauce: How ProFashion Uses the Album

Just throwing all the photos at the AI isn't enough; it needs a smart way to pick the right details at the right time. ProFashion uses two special tools to do this:

A. The "Pose-Aware Librarian" (Pose-aware Prototype Aggregator)

Imagine you are reading a book, and you need to find a specific fact. You don't read the whole book every time; you use the index to jump to the right page.

  • How it works: As the model in the video starts to turn, ProFashion looks at the model's pose (is it facing left? right? back?).
  • The Magic: It acts like a librarian who instantly grabs the exact reference photo that matches that pose. If the model turns to the back, the librarian grabs the "back view" photo and hands it to the AI. If the model faces front, it grabs the "front view."
  • Why it's cool: It does this so quickly that it doesn't slow down the computer, even though it's looking at multiple photos.

B. The "Flow-Enhanced Motion Guide" (Flow-enhanced Prototype Instantiator)

Imagine you are animating a stick figure. If you just draw the arm in a new spot for the next frame, it might look like the arm is teleporting or jittering.

  • How it works: ProFashion looks at the "motion flow"—the invisible path the body parts are traveling. It knows that if a hand moves from left to right, the pixels should follow a smooth curve, not jump.
  • The Magic: It uses this motion map to "warp" the image smoothly. It ensures that the fabric of the shirt stretches and folds naturally as the model moves, keeping the video fluid and realistic.

4. The Result: A Perfect Fashion Show

When you put these tools together, ProFashion can generate a video where:

  • No Hallucinations: The back of the jacket looks exactly like the back of the jacket in the reference photo, not a made-up mess.
  • Smooth Motion: The model turns and walks without the clothes glitching or warping strangely.
  • High Detail: You can see the texture and patterns of the fabric clearly, just like in a real high-end fashion show.

Why Does This Matter?

Currently, online stores use static photos. If you want to see how a dress moves, you have to imagine it. ProFashion allows stores to turn a few photos of a model into a full, 360-degree video instantly. This helps customers "try on" clothes virtually with much higher accuracy, potentially saving them from buying clothes that don't look right.

In short: ProFashion is like giving an AI a complete photo album and a smart guidebook, so it can paint a perfect, moving fashion show without ever guessing or making things up.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →