Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation

The paper proposes FlyThinker, an efficient "think-while-generating" framework that employs a parallel latent token-level reasoning model to dynamically guide personalized long-form generation, thereby overcoming the limitations of static one-shot reasoning while maintaining training and inference efficiency.

Chengbing Wang, Yang Zhang, Wenjie Wang, Xiaoyan Zhao, Fuli Feng, Xiangnan He, Tat-Seng Chua

Published 2026-03-06
📖 4 min read☕ Coffee break read

Imagine you are asking a very smart, but slightly generic, AI assistant to write a long story or a detailed review for you. You want it to sound exactly like you—with your specific humor, your unique way of thinking, and your personal preferences.

The problem is that most AI assistants today are like tour guides who memorize a single script for the whole tour. They might get the general facts right, but they often miss the little details that make your experience special. If you start the tour with a joke, they might forget it by the time you reach the end of the long path.

This paper introduces a new system called FlyThinker that changes how the AI "thinks" while it writes. Here is how it works, broken down into simple concepts:

1. The Old Way: "Think, Then Write" (The Tour Guide with a Script)

Previously, if an AI wanted to be personalized, it would try to "think" about your preferences once at the very beginning, write down a long plan, and then start writing the story based on that plan.

  • The Flaw: Imagine a tour guide writing a 10-page plan before starting the tour. By the time they get to the last page of the tour, they might have forgotten the first page of their plan. Also, if you suddenly want to change the route halfway through, the guide is stuck with their old plan. This is slow and often leads to the AI losing your personal "voice" as the text gets longer.

2. The New Way: "Think While Generating" (The Co-Pilot)

FlyThinker changes the game. Instead of thinking once and then writing, the AI now thinks and writes at the same time, like a co-pilot flying a plane.

  • The Analogy: Imagine you are writing a novel with a brilliant editor sitting right next to you.
    • You (The Generator): You write one sentence.
    • The Editor (The Reasoner): Simultaneously, while you are writing that sentence, the editor is already thinking about the next sentence. They are whispering, "Hey, remember how the user likes dark humor? Let's make sure the next line has a little twist."
    • The Magic: The editor doesn't wait for you to finish the whole book to give advice. They give a tiny piece of advice for every single word you write.

3. How It's Different (The "Parallel" Trick)

The paper solves a major speed problem. Usually, if an AI has to "think" before it "writes," it has to stop and wait.

  • Old Method: Write Word 1 → Stop & Think → Write Word 2 → Stop & Think. (This is slow).
  • FlyThinker: While the AI is writing Word 1, a second, smaller AI is simultaneously calculating the thought for Word 2.
  • The Result: It's like a factory assembly line where one worker is painting a car while another worker is already polishing the next one. You get the high-quality "thinking" without the slow waiting time.

4. Why It Matters for Long Texts

When writing a short email, a generic AI is fine. But when writing a long movie review or a complex story, the AI tends to "drift." It starts sounding like a robot again, forgetting your specific style.

  • FlyThinker's Superpower: Because it checks in with your preferences every single step of the way, it never loses track of who you are. Even at the very end of a long story, it remembers, "Oh right, this user loves describing the weather," and keeps that style alive.

Summary

FlyThinker is like giving the AI a personalized, real-time GPS.

  • Old AI: Sets a destination and drives blindly, hoping to stay on course.
  • FlyThinker: Checks the map and adjusts the steering wheel continuously as it drives, ensuring it stays perfectly on the path that matches your driving style, all without slowing down the car.

This makes the AI faster, smarter, and much more "you" when it writes long, complex things.