Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints

This paper proposes a novel text-to-sketch-animation method that leverages a pre-trained text-to-video diffusion model guided by SDS loss, while introducing length-area regularization for temporal consistency and As-Rigid-As-Possible loss to preserve sketch topology, thereby outperforming state-of-the-art approaches in both quantitative and qualitative evaluations.

Gaurav Rai, Ojaswa Sharma

Published 2026-02-27
📖 5 min read🧠 Deep dive

Imagine you have a simple, hand-drawn sketch of a horse on a piece of paper. Now, imagine you want that horse to gallop, but you don't want to spend hours drawing every single frame of the movement like a traditional animator. You just want to type "A galloping horse" into a computer, and poof—the sketch comes to life.

That is the goal of this paper, but the researchers found that previous attempts at doing this were a bit like trying to make a clay puppet dance: the movements were jerky, the horse's legs would stretch like rubber bands, or the whole shape would melt into a blob.

Here is how the authors fixed it, explained through simple analogies:

The Problem: The "Melting Puppet"

Previous AI methods tried to animate sketches by guessing how the lines should move. But they had two big issues:

  1. The "Jittery" Effect: The animation would look like a strobe light, where the horse's legs would jump from one spot to another without a smooth path.
  2. The "Rubber Band" Effect: As the horse moved, its body would stretch, squash, and twist until it looked nothing like the original drawing. The topology (the way the lines connect) would break.

The Solution: A Smart Puppeteer with Rules

The authors built a new system that acts like a very strict, highly skilled puppeteer. They didn't just let the AI guess; they gave it two specific "rules of the road" to follow.

1. The "Ruler and Spool" Rule (Length-Area Regularization)

The Analogy: Imagine your sketch is drawn with a piece of string. If you pull the string to make the horse move, the string shouldn't suddenly get longer or shorter, and it shouldn't leave a giant, messy trail of string behind it as it moves.

How it works:

  • Length: The AI checks that the "string" (the stroke) stays the same length from frame to frame. If a leg was 5 inches long in the first frame, it must be 5 inches long in the next. This stops the "rubber band" stretching.
  • Area: The AI also checks the "swept area." Imagine the leg moving from point A to point B. It shouldn't sweep out a huge, weird shape. It should move cleanly. This ensures the motion is smooth and not jerky.

The Result: The animation flows like water, not like a glitching video game.

2. The "Stiff Skeleton" Rule (ARAP Loss)

The Analogy: Think of your sketch as a character made of stiff cardboard cutouts connected by hinges. When the character runs, the cardboard pieces (the body parts) can rotate and slide, but they cannot bend, warp, or turn into jelly.

How it works:

  • The system treats the sketch like a mesh (a net) of triangles.
  • It uses a mathematical concept called "As-Rigid-As-Possible" (ARAP). This tells the AI: "Move the character, but keep every little triangle in the net as stiff and square as possible."
  • This prevents the horse's head from turning into a blob or its tail from twisting into a spiral.

The Result: The sketch keeps its original identity. It moves, but it still looks exactly like the drawing you started with.

The Magic Ingredient: The "Dream Guide"

To make the horse actually run (and not just wiggle), the system uses a pre-trained "Dream Guide" (a Text-to-Video Diffusion Model).

  • You tell the AI: "Run!"
  • The Dream Guide says, "Okay, here is what running looks like in a video."
  • The AI then tries to make your sketch match that video, but it uses the Ruler and Stiff Skeleton rules to make sure the sketch doesn't break while trying to match the video.

The Outcome

The paper shows that this new method is the best at:

  • Keeping the sketch looking like a sketch (no melting blobs).
  • Making the movement smooth (no jittery jumps).
  • Listening to your text (if you say "dolphin jumping," it jumps; if you say "horse running," it runs).

The One Catch

Like any new invention, it's not perfect yet.

  • The "One-Actor" Limit: It works great for one object (a single horse or a single dancer). But if you draw a horse and a rider, the AI sometimes gets confused and separates them, making the rider float away from the horse. It struggles to understand how two objects interact with each other.

In a Nutshell

This paper teaches an AI how to animate a drawing without ruining the drawing. It does this by forcing the AI to respect the length of the lines (so they don't stretch) and the stiffness of the shape (so it doesn't melt), all while following a text description to create a smooth, realistic motion. It's like giving a clay puppet a rigid skeleton and a ruler, so it can dance without falling apart.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →