ProjFlow: Projection Sampling with Flow Matching for Zero-Shot Exact Spatial Motion Control

ProjFlow is a training-free, zero-shot sampler that achieves exact spatial motion control by leveraging a novel kinematics-aware metric to enforce linear constraints while preserving motion naturalness, effectively addressing challenges in tasks like motion inpainting and 2D-to-3D lifting without requiring task-specific training.

Akihisa Watanabe, Qing Yu, Edgar Simo-Serra, Kent Fujiwara

Published 2026-02-27
📖 4 min read☕ Coffee break read

Imagine you are a director trying to choreograph a dance for a virtual character. You have a very specific vision: you want the character's hand to trace a perfect heart shape in the air, or you want their feet to follow a specific path on the floor.

The problem is, current AI tools are like enthusiastic but clumsy dancers. If you tell them, "Move your hand like this," they might get the general idea but miss the exact shape, or they might contort their body in weird, unnatural ways to try and hit the mark. They treat your instructions as "suggestions" rather than strict rules.

ProjFlow is a new tool that changes the game. It's like giving the AI a pair of invisible, flexible training wheels that force it to follow your exact rules without breaking its natural rhythm.

Here is how it works, broken down into simple concepts:

1. The "Hard Rule" vs. The "Soft Suggestion"

Most AI motion generators treat your instructions as a soft suggestion. It's like telling a dog, "Please sit," but the dog decides to lie down instead because it thinks that's close enough. The result is a motion that is almost right but not exact.

ProjFlow treats your instructions as hard rules. It's like a strict dance instructor who says, "Your hand must be exactly here." If the AI tries to move its hand even a millimeter off your line, ProjFlow instantly corrects it. It guarantees 100% accuracy on your specific constraints.

2. The "Skeleton Spine" (The Secret Sauce)

Here is the tricky part: If you force a robot's hand to move to a specific spot, a dumb robot might just twist its arm into a pretzel to get there, looking unnatural.

ProjFlow has a special "spine" in its brain called a Kinematics-Aware Metric. Think of this as a rubber band connecting all the joints of the skeleton.

  • Old way: If you pull the hand, the arm gets pulled, but the shoulder and elbow might snap into weird angles because the AI doesn't understand how a body is connected.
  • ProjFlow way: When it corrects the hand, it feels the tension travel through the rubber bands (the skeleton). It adjusts the shoulder, elbow, and torso together in a way that feels like a real human moving. It spreads the correction out so the whole body moves naturally, not just the one part you asked for.

3. Filling in the Blanks (Motion Inpainting)

Imagine you have a video of a person dancing, but half the video is missing (like a torn film reel). You know where they started and where they ended, but the middle is gone.

  • Old way: The AI guesses the middle, but it might look jerky or disconnected.
  • ProjFlow way: It acts like a smart editor. It draws a faint, temporary line between the start and end points to guide the AI. As the AI "draws" the missing frames, it slowly fades out this guide line, letting the AI's own natural dance instincts take over. This fills the gap smoothly, making the missing motion look like it was filmed perfectly.

4. 2D to 3D Magic (Lifting)

Imagine you draw a stick figure on a piece of paper (2D) and want to turn it into a 3D movie.

  • Old way: The AI might guess the depth, but the hand might end up floating in the air or passing through the floor because it's just guessing.
  • ProjFlow way: It treats your 2D drawing as a mathematical "shadow" that the 3D character must cast. It forces the 3D character to move in a way that, if you took a photo of it, would perfectly match your 2D drawing. It solves the puzzle of "What 3D pose creates this 2D shadow?" with perfect precision.

Why is this a big deal?

Usually, to get an AI to do something this specific, you have to:

  1. Train it for weeks on a specific task (like "walking with a cane").
  2. Run slow, heavy calculations every time you want to generate a video.

ProjFlow is Zero-Shot. This means it works immediately on any task without needing to be retrained. It's like having a universal remote control that works on every TV in the house without you ever needing to buy a new remote.

In summary:
ProjFlow is a magic wand for animators. It lets you draw a line, pick a spot, or sketch a shape, and it generates a full-body, realistic 3D animation that follows your rules exactly, while still looking like a natural, human movement. It combines the precision of a robot with the soul of a dancer.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →