1. Problem Statement
Continuous-time generative models, including Diffusion Models (DMs), Flow Matching (FM), and Rectified Flow, learn time-dependent vector fields to transport a noise distribution to a data distribution. However, standard training objectives treat timesteps independently.
- The Core Issue: In standard Flow Matching, velocity predictions at different timesteps (t and t′) along the same probability path are trained as independent regression tasks, despite sharing the same endpoint randomness (the source noise x0 and target data x1).
- Consequences:
- High Estimator Variance: The stochastic gradients at different timesteps are strongly correlated due to shared randomness but are treated as independent noise. This inflates the variance of the gradient estimator, leading to unstable optimization.
- Inefficient Sampling: The lack of temporal coherence induces curved trajectories in the marginal flow. This increases numerical error during inference, requiring finer time discretization or more function evaluations (NFE) to achieve high-quality samples.
- Limitations of Prior Work: Existing solutions (e.g., path-length penalties, Jacobian constraints, or modified solvers) often require changing the model architecture, the probability path, or the inference procedure, adding complexity.
2. Methodology: Temporal Pair Consistency (TPC)
The authors propose Temporal Pair Consistency (TPC), a lightweight, variance-reduction principle that operates entirely at the estimator level without modifying the model architecture, probability path, or solver.
Core Mechanism
TPC couples velocity predictions at paired timesteps sampled along the same probability path. Instead of minimizing the loss for each timestep independently, it enforces consistency between the predicted velocities at time t and a paired time t′=ψ(t), given the same endpoints (x0,x1).
The augmented loss function for a single sample is:
LTPC(t,t′)=∥vθ(xt,t)−ut∥22+∥vθ(xt′,t′)−ut′∥22+λ∥vθ(xt,t)−vθ(xt′,t′)∥22
Where:
- The first two terms are the standard Flow Matching objectives.
- The third term is the Temporal Pair Consistency penalty, which minimizes the difference between velocity predictions at paired times.
- λ is a weighting hyperparameter.
Pairing Strategies
The paper introduces two mechanisms for constructing the pair t′:
- Fixed Antithetic Pairing: Uses a deterministic map ψ(t)=1−t. This pairs early and late timesteps symmetrically. Analogous to antithetic sampling in Monte Carlo methods, this induces negative correlation in errors, reducing variance.
- Learned Monotone Pairing: Uses a learnable function ϕ(t) (parameterized as a small neural network) to discover optimal temporal correspondences. A monotonicity constraint (ϕ′(t)≥0) is enforced via a regularizer to preserve the temporal order of the path.
Stochastic Gating
To prevent over-regularization and ensure the method acts as a variance reducer rather than a hard constraint, TPC is applied stochastically. A Bernoulli variable b∼Bernoulli(ptpc) gates the consistency term. This ensures the model is still exposed to the unregularized gradient while benefiting from the variance reduction when the gate is active.
3. Theoretical Analysis
The paper provides a rigorous theoretical justification for TPC:
- Control Variate Estimator: TPC is shown to induce a control-variate effect. By coupling gradients from paired timesteps sharing the same randomness, the correlation between gradient estimates increases.
- Variance Reduction: Under mild regularity assumptions (Lipschitz continuity of the vector field), the paper proves that the optimal control variate coefficient leads to a strict reduction in gradient variance: Var(g−α∗g′)=Var(g)(1−ρ2), where ρ is the correlation between paired gradients.
- Numerical Stability: The consistency penalty acts as a Tikhonov regularizer that suppresses temporal oscillations in the learned vector field. This reduces the "temporal roughness" of the trajectory, which directly improves the numerical stability of ODE solvers (reducing discretization error for a fixed step size).
- Contraction: The regularized objective selects velocity fields with reduced temporal variation among near-minimizers of the original Flow Matching risk.
4. Key Contributions
- Novel Variance-Reduction Principle: Introduction of TPC, which enforces temporal coherence in Flow Matching without altering the underlying generative model, probability path, or solver.
- Theoretical Guarantees: Formal proof that TPC acts as a trajectory-coupled regularizer that strictly reduces gradient variance and improves ODE solver stability.
- Practical Instantiations: Development of both fixed (antithetic) and learnable pairing mechanisms, demonstrating that temporal coupling can be integrated into existing training loops with minimal overhead.
- Broad Applicability: Demonstration that TPC is compatible with various frameworks, including standard Flow Matching, Rectified Flow, and modern SOTA pipelines involving noise-augmented training and score-based denoising.
5. Experimental Results
The authors evaluated TPC on CIFAR-10 and ImageNet (resolutions up to 128×128) across unconditional and conditional generation tasks.
- Sample Quality & Efficiency: TPC consistently improves the Fréchet Inception Distance (FID) at identical or lower computational costs (NFE) compared to baselines.
- Example (CIFAR-10): Reduced FID from 6.35 (standard FM with Optimal Transport) to 3.19 at the same NFE.
- Example (ImageNet 128x128): Improved FID from 20.9 to 18.6.
- Rectified Flow: In Rectified Flow settings, TPC complements trajectory straightening, improving performance in both one-step generation and full-simulation regimes.
- SOTA Pipelines: TPC was successfully applied to modern, noise-augmented training pipelines (similar to Diffusion models) on ImageNet-64 and ImageNet-128, achieving FID scores competitive with state-of-the-art diffusion and GAN models (e.g., improving FID from 6.8 to 4.9 on ImageNet-128 with noise augmentation).
- Ablation Studies: Results show that moderate temporal coupling yields the best performance. Learned pairing strategies generally outperform fixed ones, and the method is robust to hyperparameter choices.
6. Significance
This work addresses a fundamental inefficiency in continuous-time generative modeling: the wasted temporal correlation in training objectives.
- Efficiency: It allows for high-quality generation with fewer sampling steps (lower NFE), making generative models more practical for real-world applications.
- Simplicity: Unlike previous methods that require complex architectural changes or new solvers, TPC is a "plug-and-play" modification to the loss function.
- Generalizability: It bridges the gap between Flow Matching and Diffusion Models, showing that variance reduction via temporal consistency is a universal principle applicable across different continuous-time frameworks.
In summary, Temporal Pair Consistency offers a theoretically grounded, lightweight, and highly effective method to stabilize training and improve sampling efficiency in flow-based generative models by leveraging the inherent temporal structure of probability paths.