Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models

Imagine you have a super-smart robot chef named Chronos. This chef is famous for predicting what the weather, stock market, or electricity usage will look like in the future. It's so good at its job that it's used in high-stakes situations, like managing power grids or financial trading.

But here's the problem: No one knows how the chef thinks. It's a "black box." You give it data, and it gives you a prediction, but if you ask why it made that choice, it just shrugs.

This paper is like hiring a team of microscopic detectives (called Sparse Autoencoders) to sneak inside the chef's brain, open up its drawers, and see exactly what ingredients it's using to cook up those predictions.

Here is what they discovered, broken down into simple stories:

1. The "Brain Scan" Experiment

The researchers didn't just guess; they performed surgery on the robot's brain. They looked at six different "rooms" (layers) inside the chef's mind. In each room, they found thousands of tiny, specialized tools (features) that the robot uses to process time.

To test if these tools were actually important, the detectives played a game of "What if we remove this tool?"

They took out one tiny tool at a time.
The Result: Every single time they removed a tool, the chef's cooking got worse.
The Lesson: Every single tool the robot uses is essential. There are no "useless" parts in this brain.

2. The Three-Story Building Analogy

The researchers found that the robot's brain is organized like a three-story building, where each floor does something very different:

The Basement (Early Layers): The Raw Material Sorters
- What they do: These layers are busy sorting basic ingredients. They look for simple things like "Is it getting louder?" (frequency) or "Is it shaking?" (volatility).
- Analogy: Like a grocery store clerk just checking if an apple is red or green. It's basic, but necessary.
The Middle Floor (Mid-Encoder): The Alarm System
- What they do: This is the most critical floor. It doesn't care about the boring, repeating patterns. Instead, it screams when something sudden happens. It's looking for "Level Shifts"—like when the temperature suddenly spikes or the stock market crashes.
- The Surprise: This floor is the boss. If you break a tool here, the robot's predictions go from "okay" to "disaster" immediately. It's the heart of the robot's ability to handle surprises.
The Penthouse (Final Encoder): The Encyclopedia
- What they do: This floor is full of fancy, complex knowledge. It knows about seasons, long-term trends, and every possible pattern in history. It's the "smartest" looking floor.
- The Twist: Here is the weird part. The researchers found that if they started removing tools from this fancy Penthouse, the robot actually got better at its job!
- Why? It seems the Penthouse is so full of "general knowledge" from its training that it sometimes gets confused by the specific task at hand. Removing some of that "noise" helped the robot focus.

3. The Big "Aha!" Moment

The most important discovery is a paradox:

The "smartest" part of the brain (the Penthouse) isn't the most important for making good predictions. The "alarm system" in the middle (the Mid-Encoder) is.

Most people assume that the final, most complex part of an AI is where the magic happens. But this paper shows that for time series (predicting the future based on the past), the magic happens when the AI detects sudden changes, not when it memorizes complex patterns.

The Takeaway for Everyone

Think of this robot like a survivalist rather than a historian.

A historian studies all the old books (the Penthouse) to guess what happens next.
A survivalist watches for the sudden crack of a twig or a shift in the wind (the Mid-Encoder) to know a storm is coming.

This paper proves that the robot Chronos is a survivalist. It relies on spotting sudden changes in the data to make its predictions, not on reciting a history book.

Why does this matter?
Now that we know how the robot thinks, we can:

Trust it more because we know it's looking for real changes, not just guessing.
Fix it better if it makes a mistake (we know exactly which "alarm" to check).
Build even better robots by focusing on those "alarm" mechanisms rather than just making them "smarter" with more data.

In short: We finally opened the black box, and we found that the robot's superpower is its ability to spot the unexpected, not its ability to memorize the past.

Here is a detailed technical summary of the paper "Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models" presented at ICLR 2026.

1. Problem Statement

Time Series Foundation Models (TSFMs), such as Chronos-T5, have achieved state-of-the-art performance in forecasting but operate as "black boxes." Unlike Natural Language Processing (NLP), where Mechanistic Interpretability (MI) has advanced significantly using Sparse Autoencoders (SAEs) to decompose model activations into interpretable features, TSFMs lack similar internal analysis. Existing time series interpretability relies on post-hoc methods (e.g., saliency maps, perturbations) rather than causal, mechanistic dissection. This paper addresses the gap by applying SAEs to a TSFM to determine if learned features are causally relevant and how temporal concepts are organized within the model's architecture.

2. Methodology

The authors applied a rigorous mechanistic interpretability pipeline to Chronos-T5-Large (710M parameters, 24 encoder/decoder layers).

Sparse Autoencoders (SAEs):
- Architecture: TopK SAEs were trained on residual stream activations at six specific extraction points: Encoder blocks 5 (early), 11 (mid), and 23 (final); Decoder blocks 11 and 23; and the cross-attention output at Decoder 11.
- Configuration: Expansion factor of 8 ( $d_{sae} = 8,192$ features per layer), TopK sparsity with $k=64$ .
- Training: Trained for 50,000 steps using MSE reconstruction loss, Adam optimizer, and dead-feature resampling.
Feature Taxonomy:
- Features were classified into 11 temporal concept categories (e.g., trend up/down, seasonality, level shifts, frequency, volatility, noise) using Pearson correlation between feature activation patterns and ground-truth synthetic diagnostic data.
Causal Validation (Ablation):
- Single-Feature Ablation: Zeroing out individual feature activations ( $z_j \leftarrow 0$ ) and measuring the change in Continuous Ranked Probability Score ( $\Delta CRPS$ ).
- Progressive Ablation: Cumulatively removing features sorted by decoder-norm contribution to observe performance degradation or improvement trends.
- Datasets: Synthetic data for taxonomy validation and the ETT (Electricity Transformer Temperature) benchmark for causal experiments.

3. Key Contributions

First SAE Application to TSFMs: The study demonstrates that SAEs can successfully decompose TSFM activations into causally relevant features, achieving a 100% positive ablation rate across 392 tested features (every ablated feature degraded forecast quality).
Depth-Dependent Hierarchy: The paper establishes a clear architectural hierarchy where different layers specialize in distinct temporal concepts:
- Early Layers: Encode low-level frequency and volatility.
- Mid-Encoder: Concentrates causally critical "change-detection" features.
- Final Encoder: Compresses a rich, diverse taxonomy of semantic concepts (seasonality, trends).
Inverse Relationship between Semantic Richness and Causal Importance: A counterintuitive finding that the layer with the most diverse semantic coverage (Final Encoder) is less critical for immediate forecasting performance than the Mid-Encoder, which contains fewer but highly impactful features.

4. Key Results

A. Universal Causal Relevance

Across 392 single-feature ablations, every feature removal resulted in a strictly positive $\Delta CRPS$ (worse performance). This confirms that the SAE features are not redundant noise but encode essential information the model actively uses.

Layer 11 (Mid-Encoder): Showed the highest impact. The top feature caused a $\Delta CRPS$ of 38.61. The distribution was heavily right-skewed (Max/Median ratio = 30.5×), indicating a few "super-features" drive performance.
Layer 23 (Final Encoder): Showed a much more uniform distribution (Max/Median ratio = 3.9×), suggesting redundancy.

B. Temporal Concept Hierarchy

Early Encoder (Block 5): Dominated by high-frequency and high-volatility features (4.9% labeled).
Mid-Encoder (Block 11): The "Change-Detection Hub." Dominated by Level Shift Up (1,024 features, ~12.5% of the layer) and Noise (413 features). Notably, seasonality is nearly absent here (0.5%).
Final Encoder (Block 23): The "Semantic Compressor." Contains the richest diversity, dominated by Seasonality (1,439 features, 17.6%) and Level Shifts. It covers all concept categories.

C. The "Paradoxical Improvement"

Progressive ablation revealed a critical divergence:

Mid-Encoder: Ablating features caused catastrophic performance drops (CRPS rose from 2.61 to 25.32).
Final Encoder: Ablating features improved forecast quality (CRPS dropped from 3.62 to 2.73).
- Interpretation: The final encoder likely contains features optimized for general pretraining domains that are suboptimal for the specific ETT dataset. Removing them acts as implicit domain adaptation.

5. Significance and Conclusion

Mechanistic Interpretability Transfer: The study proves that MI techniques developed for NLP (specifically SAEs) transfer effectively to Time Series Foundation Models, enabling causal analysis of internal representations.
Forecasting Mechanism: The results suggest that Chronos-T5 relies primarily on abrupt dynamics detection (level shifts, noise) rather than periodic pattern recognition for forecasting on the ETT dataset. The "critical path" of computation lies in the mid-encoder, not the semantically richest final layer.
Practical Implications:
- Pruning Strategies: Uniform pruning is dangerous; it would disproportionately damage the critical few features in the mid-encoder.
- Model Optimization: The final encoder's features may be redundant for specific tasks, suggesting potential for model compression or domain-specific fine-tuning by removing "general" features.

Limitations: The taxonomy is heuristic (82.8% of features remain unlabeled), experiments are limited to the ETT dataset, and the analysis is restricted to the Chronos-T5-Large architecture. Future work requires larger expansion factors and circuit-level analysis.

Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models

1. The "Brain Scan" Experiment

2. The Three-Story Building Analogy

3. The Big "Aha!" Moment

The Takeaway for Everyone

1. Problem Statement

2. Methodology

3. Key Contributions

4. Key Results

A. Universal Causal Relevance

B. Temporal Concept Hierarchy

C. The "Paradoxical Improvement"

5. Significance and Conclusion

More like this

Equitable Multi-Task Learning for AI-RANs

SPREAD: Subspace Representation Distillation for Lifelong Imitation Learning

The Temporal Markov Transition Field

SoftJAX & SoftTorch: Empowering Automatic Differentiation Libraries with Informative Gradients

Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models