InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

The paper presents InfinityStory, a novel framework, dataset, and model that overcome key limitations in long-form video generation by ensuring background and character consistency across shots while enabling seamless multi-subject transitions for hour-long narratives.

Mohamed Elmoghany, Liangbing Zhao, Xiaoqian Shen, Subhojyoti Mukherjee, Yang Zhou, Gang Wu, Viet Dac Lai, Seunghyun Yoon, Ryan Rossi, Abdullah Rashwan, Puneet Mathur, Varun Manjunatha, Daksh Dangi, Chien Nguyen, Nedim Lipka, Trung Bui, Krishna Kumar Singh, Ruiyi Zhang, Xiaolei Huang, Jaemin Cho, Yu Wang, Namyong Park, Zhengzhong Tu, Hongjie Chen, Hoda Eldardiry, Nesreen Ahmed, Thien Nguyen, Dinesh Manocha, Mohamed Elhoseiny, Franck Dernoncourt

Published 2026-03-05
📖 4 min read☕ Coffee break read

Imagine you are a director trying to film a full-length movie, but you only have a magic camera that can take 5-second clips. You want to tell a story about a knight, a dragon, and a princess, but every time you try to stitch these clips together, two terrible things happen:

  1. The Background Glitch: In one clip, the castle is made of stone with a blue sky. In the very next clip, the castle suddenly looks like it's made of jelly with a purple sky. The world keeps changing its mind.
  2. The Teleporting Actors: The knight walks into the frame in one clip, but in the next clip, he just poofs into existence in the middle of the room, or vanishes instantly without walking out. It looks like a magic trick gone wrong, not a movie.

InfinityStory is a new AI system designed to fix exactly these two problems. It's like a super-smart production team that ensures the world stays consistent and the actors move naturally from scene to scene.

Here is how it works, broken down into simple concepts:

1. The "Location Library" (Fixing the Background)

Most AI video makers treat every scene as a brand-new painting. If you ask for a "forest," the AI might draw a forest with pine trees in one shot and oak trees in the next.

InfinityStory's Solution:
Think of the AI as having a digital "Location Library." Before the movie starts, it creates a few permanent "sets" (like a specific castle, a specific forest, a specific room).

  • Once the "Castle" set is built, the AI locks it in.
  • Every time the story returns to the castle, the AI doesn't redraw it; it pastes the characters onto the exact same background image.
  • The Analogy: Imagine you are making a stop-motion movie. Instead of rebuilding the set every time you move the camera, you build the set once and keep it there. You just move the clay figures around. This ensures the castle never changes its shape or color, no matter how many scenes you film.

2. The "Choreographer" (Fixing the Actors)

Usually, when AI switches from one clip to another, it doesn't know how to handle characters entering or leaving. It's like a play where actors run off stage and the next scene starts with them already standing in the center, looking confused.

InfinityStory's Solution:
The system uses a special "Choreographer" (a smart AI agent) that plans the movement before the video is made.

  • It creates a special dataset of 10,000 practice videos showing exactly how characters should walk in, walk out, or swap places.
  • It teaches the AI a specific rule: "No teleporting allowed." If a character leaves the frame, they must walk out. If they enter, they must walk in.
  • The Analogy: Think of a dance instructor. A normal AI just snaps its fingers and the dancers appear. InfinityStory's AI is the instructor who says, "Okay, the knight needs to walk from the left door to the center, and the dragon needs to fly in from the right." It fills the gap between the clips so the movement feels smooth and continuous.

3. The "Assembly Line" (How the Movie is Made)

The system uses a team of AI "agents" (specialized robots) that work together like a movie studio:

  • The Screenwriter: Breaks the story into chapters and scenes.
  • The Location Manager: Assigns a specific "set" to every scene so the background never drifts.
  • The Director: Decides who is in the shot and what they are doing.
  • The Editor: Uses two different tools to stitch the movie together:
    • Tool A (The Storyteller): Generates the main action clips (e.g., the knight fighting the dragon).
    • Tool B (The Bridge Builder): Generates the transition clips. This is the secret sauce. It takes the last frame of the previous clip and the first frame of the next clip and creates a smooth video that connects them, ensuring the characters move naturally between the two moments.

Why Does This Matter?

Before this, making a long AI video was like trying to build a house out of sand; the walls would crumble and shift as you added more rooms.

InfinityStory is the first system that can build a stable, hour-long movie where:

  • The world feels real and consistent (the castle stays a castle).
  • The characters feel alive (they don't teleport; they walk and talk).
  • The transitions are smooth, like a real Hollywood movie, not a jarring slideshow.

In short, it takes the "magic" out of the glitches and replaces it with the "logic" of a real film production, allowing us to finally tell long, coherent stories with AI.