MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

Imagine you are trying to walk from your front door to a park across a large, foggy field. This journey represents generating an image or video using a modern AI. The AI doesn't just "snap" the picture into existence; it has to take hundreds of tiny, careful steps, adjusting its path at every single moment to make sure the final result looks right.

The problem? This walk is slow. It takes a long time and uses a lot of computer power, making it hard to use in real-time apps (like chatting with an AI that draws pictures instantly).

The Old Way: "The Instantaneous Step"

To speed things up, previous methods tried to take shortcuts. They looked at the direction the AI was walking right now (the "instantaneous velocity") and said, "Okay, let's just guess the next few steps based on this exact direction."

The Analogy: Imagine you are driving a car on a winding mountain road. You look at the steering wheel right now, see it's turned slightly left, and decide to drive straight left for the next mile.
The Problem: The road curves! If you only look at the steering wheel for one split second, you will miss the curve, drive off the cliff, and crash. In AI terms, this causes the image to get blurry, distorted, or completely wrong. This is called "error accumulation."

The New Way: "MeanCache" (The Average Pace)

The authors of this paper, MeanCache, realized that looking at just one split-second direction is too shaky. Instead, they decided to look at the average speed and direction over a short stretch of the road.

The Analogy: Instead of guessing the next mile based on the steering wheel right now, you look at the last 100 meters you drove. You calculate your average path. Even if you wobbled a bit in the last second, your average path over the last 100 meters is much smoother and more accurate.

How it works:

The "Cache" (The Memory Bank): The AI remembers where it was a little while ago.
The "Math Trick" (JVP): It uses a clever mathematical shortcut to figure out the "average direction" between where it was and where it is now, without having to do all the heavy calculations again.
The Result: The AI takes bigger, safer steps. It skips the boring, repetitive parts of the calculation because it knows the "average" path is stable.

The "Traffic Controller" (Scheduling)

There's a catch: You can't skip steps everywhere. If you skip too many steps at the beginning of the journey (when the AI is figuring out the basic shape of the image), you'll get lost. If you skip too many at the end, the details will be fuzzy.

MeanCache includes a smart Traffic Controller.

The Analogy: Imagine a GPS that knows exactly which parts of the road are straight and safe to speed through, and which parts are sharp curves where you must slow down.
The Strategy: It looks at the "stability" of the path. If the path is smooth, it skips steps aggressively. If the path is wiggly and dangerous, it forces the AI to slow down and calculate carefully. It finds the perfect balance to get you to the park as fast as possible without crashing.

Why is this a big deal?

The paper tested this on some of the most powerful AI models in the world (FLUX.1, Qwen-Image, HunyuanVideo).

Speed: They made these models 3 to 4.5 times faster.
Quality: Unlike the old shortcuts that made images look like a blurry mess, MeanCache kept the images sharp and beautiful.
No Training Needed: Usually, to make AI faster, you have to re-teach the AI from scratch (which takes weeks and millions of dollars). MeanCache is like putting a turbocharger on a car that's already built. It works immediately without changing the engine.

In a Nutshell

MeanCache is like giving an AI a pair of smart glasses and a GPS. Instead of stumbling through the fog step-by-step, it looks at the "average" path ahead, skips the safe parts, and slows down only when the road gets tricky. The result? You get your high-quality image or video in a fraction of the time, without the AI getting lost.

Here is a detailed technical summary of the paper "MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference".

1. Problem Statement

Flow Matching (FM) has emerged as a powerful paradigm for generative modeling (image, video, multi-modal), offering continuous transport paths between noise and data distributions. However, commercial-scale models (e.g., FLUX.1, Qwen-Image, HunyuanVideo) suffer from high inference latency due to the large number of sequential steps required for denoising.

Existing acceleration methods face two primary limitations:

Training-Heavy Methods: Techniques like distillation, pruning, and quantization require significant architectural modifications and large-scale retraining, making them costly and difficult to deploy on pre-trained commercial models.
Training-Free Caching Limitations: Current caching strategies (e.g., DeepCache, TeaCache) rely on instantaneous velocity or feature reuse. The paper argues that instantaneous velocities fluctuate sharply along the denoising trajectory. Reusing these unstable signals leads to severe trajectory deviations and error accumulation, especially at high acceleration ratios (skipping many steps), resulting in degraded generation quality (blurring, structural distortion).

2. Methodology: MeanCache

MeanCache is a training-free caching framework that shifts the perspective from instantaneous velocity to average velocity to ensure trajectory stability. It consists of two core components:

A. Instantaneous to Average Velocity Transformation

Instead of caching the instantaneous velocity $v(z_t, t)$ , MeanCache approximates the interval average velocity $u(z_t, t, s)$ over a time interval $[s, t]$ .

Theoretical Basis: Leveraging the MeanFlow Identity, the paper establishes a relationship between instantaneous velocity and average velocity involving a derivative term.
JVP Approximation: The derivative term in the identity corresponds to a Jacobian-Vector Product (JVP). Since exact JVPs are unavailable during inference, MeanCache constructs a cache estimator by reusing cached states from a previous timestep $r$ $r$ .
- It approximates the JVP between $t$ and $s$ using the displacement and velocity information from the interval $[r, t]$ .
- This allows the construction of a smoother, more stable average velocity signal $\hat{u}(z_t, t, s)$ that corrects trajectory drift and mitigates local error accumulation.

B. Trajectory-Stability Scheduling Strategy

Determining when to cache and how far back to look (the cache span $K$ ) is non-trivial, as stability varies across timesteps. MeanCache introduces a graph-based scheduling tool:

Multigraph Construction: Timesteps are modeled as nodes. Directed edges represent potential caching transitions ( $t \to s$ ) with weights defined by the stability deviation (the error between the true average velocity and the cached approximation).
Peak-Suppressed Shortest Path: To find the optimal caching schedule under a budget constraint (maximum number of function evaluations, $B$ $B$ ), the method solves a constrained shortest-path problem.
- It employs a peak-suppression objective (using a power-weighted cost with parameter $\gamma$ ) to prevent the solution from concentrating errors into a few "bad" edges. This ensures that error spikes are avoided, leading to a more uniform and stable generation process.
- This strategy dynamically determines the optimal cache placement and span $K$ without retraining the model.

3. Key Contributions

Average-Velocity Perspective: Redefines the caching problem by moving from the unstable instantaneous-velocity domain to the more stable average-velocity domain, providing a theoretical foundation for high-acceleration inference.
JVP-Based Cache Construction: Introduces a novel method to estimate average velocities using cached Jacobian-Vector Products, effectively correcting trajectory drift without retraining.
Trajectory-Stability Scheduling: Develops a practical, graph-based scheduling tool (Peak-Suppressed Shortest Path) that optimizes cache timing and span to minimize error accumulation dynamically.
State-of-the-Art Performance: Demonstrates that MeanCache achieves significant speedups while maintaining generation quality superior to existing baselines.

4. Experimental Results

The authors evaluated MeanCache on three state-of-the-art models: FLUX.1 (text-to-image), Qwen-Image (text-to-image), and HunyuanVideo (text-to-video).

Acceleration Speedups:
- FLUX.1: Achieved 4.12× acceleration.
- Qwen-Image: Achieved 4.56× acceleration.
- HunyuanVideo: Achieved 3.59× acceleration.
Quality Metrics:
- MeanCache consistently outperformed baselines (TeaCache, DiCache, TaylorSeer, DBCache) in perceptual quality (ImageReward, CLIP Score) and reconstruction fidelity (LPIPS, SSIM, PSNR).
- Notably, at high acceleration ratios (e.g., >3.5×), baseline methods suffered from severe blurring and structural collapse, whereas MeanCache preserved fine details and content consistency.
- Rare-Word Consistency: In tests with rare-word prompts (e.g., "Matutinal"), MeanCache maintained semantic consistency where other methods exhibited severe content drift.
Efficiency: The method significantly reduced FLOPs and latency while avoiding the need for model retraining or architecture changes.

5. Significance and Impact

Commercial Viability: MeanCache offers a practical, training-free solution for accelerating large-scale generative models, making them more viable for interactive and resource-constrained applications.
Theoretical Insight: By validating that average velocities provide a more stable foundation for inference than instantaneous velocities, the paper opens new avenues for research in flow matching and generative modeling.
Generalizability: The trajectory-stability scheduling strategy is a general tool that can be adapted to other caching or acceleration scenarios in diffusion and flow-based models.

In conclusion, MeanCache successfully bridges the gap between theoretical stability (average velocity) and practical efficiency (caching), setting a new benchmark for high-speed, high-fidelity generative inference.

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

The Old Way: "The Instantaneous Step"

The New Way: "MeanCache" (The Average Pace)

The "Traffic Controller" (Scheduling)

Why is this a big deal?

In a Nutshell

1. Problem Statement

2. Methodology: MeanCache

A. Instantaneous to Average Velocity Transformation

B. Trajectory-Stability Scheduling Strategy

3. Key Contributions

4. Experimental Results

5. Significance and Impact

More like this

Equitable Multi-Task Learning for AI-RANs

SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning

The Temporal Markov Transition Field

SoftJAX & SoftTorch: Empowering Automatic Differentiation Libraries with Informative Gradients

Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models