StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser

Imagine you are trying to predict the weather for next week. You have a lot of historical data (temperature, humidity, wind), but the future is uncertain. You want to give a forecast that isn't just a single number (like "it will be 70°F"), but a range of possibilities with confidence levels (like "it's likely 70°F, but could be 65°F or 75°F").

This is what StaTS does, but for any kind of time-series data (stock prices, electricity usage, traffic flow). It uses a type of AI called a Diffusion Model.

Here is the simple breakdown of how StaTS works, using a few creative analogies.

1. The Problem: The "Blurry Photo" Analogy

Imagine you have a clear photo of a landscape (your clean data). To train an AI to "un-blur" photos, you usually take the photo and slowly add static noise to it until it's just white fuzz. Then, you teach the AI to reverse the process: starting from white fuzz, it learns to remove the noise step-by-step to get the photo back.

The issue with old methods:
Most AI models use a fixed recipe for adding that noise. They add the same amount of "static" at every step, regardless of what the photo actually looks like.

The Flaw: Sometimes, this fixed recipe makes the photo look so weird in the middle steps that the AI gets confused and can't un-blur it properly. It's like trying to un-mix a smoothie where the blender was set to the wrong speed; the ingredients get smashed in a way that's impossible to separate back into fruit and yogurt.

2. The Solution: StaTS (The Smart Chef)

StaTS is like a Smart Chef who doesn't just follow a recipe book. Instead, the Chef learns the perfect way to scramble the ingredients for this specific dish so they can be un-scrambled perfectly later.

StaTS has two main parts that work together:

Part A: The Spectral Trajectory Scheduler (STS) – "The Custom Noise Recipe"

Instead of using a fixed noise recipe, STS learns the best way to add noise.

The Analogy: Imagine you are trying to hide a secret message in a song. A fixed method might just turn up the volume of static equally across all frequencies. But a smart method (STS) knows that the "bass" (low frequencies) is important for the rhythm, and the "treble" (high frequencies) holds the melody.
What it does: STS looks at your specific data and figures out exactly how much noise to add to the "bass" and how much to the "treble" at every single step. It ensures that even when the data is very noisy, the AI can still see the underlying structure (the rhythm and melody) clearly enough to recover it later. It creates a "smooth path" for the AI to walk back from chaos to clarity.

Part B: The Frequency Guided Denoiser (FGD) – "The Frequency Detective"

Once the noise is added, the AI needs to remove it. Most AIs look at the data as a whole. FGD is different; it looks at the frequencies (the different "notes" in the data).

The Analogy: Imagine you are trying to clean a muddy window. A normal cleaner wipes the whole window. FGD is like a detective who knows exactly where the mud is. It knows, "Oh, the mud is mostly on the bottom left, and it's mostly on the high-frequency scratches."
What it does: FGD estimates how much the "noise recipe" (from Part A) damaged the different parts of the signal. If the noise messed up the "rhythm" (low frequency) more than the "melody" (high frequency), FGD focuses its cleaning power there. It adjusts its strength dynamically, ensuring it doesn't over-clean one part and under-clean another.

3. How They Work Together: The Dance

The paper uses a two-stage training dance:

Step 1: The Chef (STS) tries out a noise recipe. The Detective (FGD) tries to clean it up.
Step 2: If the Detective struggles, the Chef changes the recipe to make it easier to clean.
Step 3: They repeat this until they find the perfect partnership where the noise is added in a way that is easy to remove, leading to a super-accurate forecast.

Why is this a big deal?

Better Uncertainty: It doesn't just guess a number; it gives you a reliable range of what might happen. This is crucial for things like managing electricity grids or financial risks.
Faster: Because the "noise recipe" is optimized, the AI doesn't need to take as many steps to clean the data. It can get a great answer in fewer steps, saving time and computer power.
Adaptable: It works well on very different types of data (from traffic jams to solar power) because it learns the specific "personality" of each dataset rather than forcing a one-size-fits-all approach.

In summary: StaTS is a time-series forecasting AI that stops using a "one-size-fits-all" noise recipe. Instead, it learns a custom noise pattern for your data and uses a frequency-savvy detective to clean it up, resulting in faster, more accurate, and more reliable predictions.

1. Problem Statement

Probabilistic time series forecasting aims to model the conditional distribution of future values given historical data, which is crucial for risk management in domains like finance and healthcare. While Denoising Diffusion Probabilistic Models (DDPMs) have shown promise in this field, existing approaches suffer from two primary limitations:

Fixed Noise Schedules: Most methods rely on predefined, fixed noise schedules (e.g., linear or cosine). These often produce intermediate noisy states that are difficult to invert (spectral collapse) and a terminal state that deviates from the pure noise assumption. This leads to unstable inversion and a mismatch between training and inference distributions.
Time-Domain Conditioning: Prior methods primarily condition on time-domain inputs, failing to explicitly model how trends, periodic components, and stochastic noise degrade across different frequency bands. This limits the model's ability to recover structural information across varying noise levels.

The core challenge is to design a diffusion framework that learns a data-adaptive noise schedule to ensure invertible intermediate states and employs frequency-aware denoising to effectively restore structural information.

2. Methodology: StaTS Framework

The authors propose StaTS, a diffusion-based framework that jointly optimizes the forward corruption trajectory and the reverse denoising process through two alternating modules: the Spectral Trajectory Scheduler (STS) and the Frequency Guided Denoiser (FGD).

A. Spectral Trajectory Scheduler (STS)

Unlike standard models with fixed $\beta_t$ , STS learns a data-adaptive noise schedule $\beta(t)$ parameterized by a lightweight MLP. It optimizes the corruption trajectory using a composite loss function with spectral regularization:

Boundary Objectives:
- Lower Bound Barrier ( $L_{bar}$ ): Prevents $\beta(t)$ from collapsing to zero, ensuring the denoiser trains on diverse noise levels.
- Endpoint Objectives ( $L_{end}, L_{init}$ ): Minimizes the KL divergence between the terminal state's spectral mass and a uniform distribution (ensuring spectral flatness) and penalizes excessively large initial noise.
Spectral Flatness Objective ( $L_{prog}$ ): Enforces a smooth, controlled evolution of spectral flatness from the clean signal to the terminal noise, preventing spectral collapse in intermediate steps.
Smoothness & Forecasting Objective: Ensures the schedule is smooth and optimizes the trajectory specifically to improve the downstream forecasting performance ( $L_{obj}$ ).

B. Frequency Guided Denoiser (FGD)

The FGD predicts the clean target $x_0$ by leveraging spectral dynamics. It consists of three key components:

Conditional Guidance Module: Operates in the Fourier domain to generate a deterministic anchor forecast. It uses a learnable magnitude-based gate and multi-band complex reweighting to filter and scale frequency components of the history window.
Spectral Distortion Estimation: Explicitly estimates the spectral distortion induced by the learned schedule. It computes the ratio of the spectral magnitude of the corrupted history to the clean history.
Spectral Conditioned Denoising: Uses the estimated distortion as a guidance signal (via a gating mechanism) to modulate the denoising strength. This allows the model to adaptively allocate restoration effort across different diffusion steps and variables.
Fusion: The final prediction is an adaptive combination of the deterministic anchor forecast and the stochastic denoised sample.

C. Training Strategy

StaTS employs a two-stage training procedure to stabilize the coupling between schedule learning and denoiser optimization:

Stage 1 (Alternating Updates): For $k$ epochs, the STS and FGD are updated alternately. The schedule is fixed while training the denoiser, and the denoiser is frozen while optimizing the schedule.
Stage 2 (Convergence): The STS is frozen with the learned schedule, and the FGD is trained to convergence. This eliminates forward process drift and ensures stable optimization.

3. Key Contributions

Joint Optimization of Schedule and Denoiser: StaTS is the first diffusion framework to jointly learn a data-adaptive noise schedule and a frequency-guided denoiser, aligning the corruption trajectory with the restoration capacity.
Spectral Trajectory Scheduler (STS): Introduces a novel scheduler that uses spectral regularization to ensure intermediate states are spectrally separable and invertible, avoiding the pitfalls of fixed schedules.
Frequency Guided Denoiser (FGD): Proposes a mechanism that explicitly estimates schedule-induced spectral distortion to modulate denoising strength, enabling heterogeneous restoration across diffusion steps.
Efficiency and Performance: Demonstrates that StaTS achieves state-of-the-art performance with fewer sampling steps, reducing computational overhead while maintaining high uncertainty quantification quality.

4. Experimental Results

The authors evaluated StaTS on eight real-world multivariate benchmarks (including ETTh1/2, ETTm1/2, Electricity, Traffic, SolarEnergy, and ILI).

Performance: StaTS consistently outperforms strong baselines (CSDI, D3VAE, TimeDiff, DiffusionTS, NsDiff) across all metrics.
- CRPS (Uncertainty): Achieved significant improvements (e.g., 13.18% improvement on ETTh1, 17.43% on ETTm1), indicating superior probabilistic distribution modeling.
- MAE/MSE (Accuracy): Consistently achieved lower point forecasting errors.
Fewer Sampling Steps: StaTS maintains high accuracy even with very few diffusion steps (e.g., $T=10$ or $20$). In contrast, fixed schedules degrade significantly with low step counts due to poor intermediate state invertibility.
Ablation Studies: Removing the spectral endpoint objectives or the spectral distortion estimation led to the most significant performance drops, confirming the necessity of spectral regularization and frequency guidance.
Visualization: Qualitative results show StaTS tracks oscillatory dynamics (phase and amplitude) more faithfully than baselines, which often produce overly smoothed predictions or uniform uncertainty bands.

5. Significance

Theoretical Insight: The paper highlights that the noise schedule is not merely a hyperparameter but a learnable component critical to the invertibility of the diffusion process. By regularizing the schedule in the frequency domain, StaTS ensures that the "path" from noise to data preserves structural information.
Practical Impact: The ability to achieve high-quality probabilistic forecasts with fewer sampling steps makes StaTS highly suitable for real-time applications where inference latency is a constraint.
Generalizability: The frequency-guided approach offers a new paradigm for time series modeling, moving beyond time-domain conditioning to explicitly handle the degradation of periodic and trend components during the diffusion process.

In summary, StaTS advances the state of the art in probabilistic time series forecasting by treating the noise schedule as a learnable, data-adaptive trajectory and leveraging frequency-domain dynamics to guide the denoising process, resulting in more accurate, stable, and efficient predictions.