SEA-TS: Self-Evolving Agent for Autonomous Code Generation of Time Series Forecasting Algorithms

Imagine you are trying to teach a robot how to predict the weather, but instead of giving it a textbook, you just say, "Go figure it out."

Most current AI tools are like junior interns. They read your instructions, write some code, run it, and if the numbers look good, they stop. If the numbers look bad, they try again, but they often forget why they failed. They might accidentally cheat (like peeking at tomorrow's weather to predict today's) and get a high score, only to fail miserably in the real world.

SEA-TS is different. It's not just an intern; it's a self-evolving master chef who is constantly training in a high-tech kitchen.

Here is how it works, broken down into simple concepts:

1. The "Taste Test" That Gets Smarter (Metric-Advantage MCTS)

Imagine a cooking competition. In a normal contest, a judge gives a score of 1 to 10. If you get a 9, you are happy. If you get a 9.1, you are slightly happier. But is 9.1 a huge breakthrough or just a tiny tweak?

SEA-TS uses a special judge called MA-MCTS. Instead of just giving a score, this judge looks at all the dishes made so far.

If you make a dish that is slightly better than the average, the judge gives you a tiny nudge.
If you make a dish that is a genuine masterpiece (a huge leap forward), the judge gives you a massive boost.
This helps the AI focus on the "home runs" rather than wasting time on tiny, useless tweaks. It's like a coach who knows exactly when to push an athlete harder because they are on the verge of a record.

2. The "Strict Editor" Who Never Forgets (Code Review & Running Prompt)

This is the most important part. Imagine the AI writes a recipe. Before it's allowed to cook again, a Strict Editor (another AI) reads the recipe line by line.

The Catch: The Editor doesn't just say "Good job" or "Bad job." It finds why a recipe failed. Did the chef peek at the future? Did they mix up the salt and sugar?
The Magic: Once the Editor finds a mistake, it doesn't just fix that one recipe. It updates the Master Cookbook (the "Running Prompt").
From that moment on, every future recipe the AI writes automatically includes a note saying: "Remember: Never peek at the future, and always add salt before sugar."
The AI literally learns from its own mistakes and never makes the same logical error twice. It's like a student who, after failing a math test, writes a rule on their wall: "Never divide by zero," and never forgets it again.

3. The "Global Tour Guide" (Global Steerable Reasoning)

Usually, AI agents only look at their immediate neighbors (what their "sibling" code looked like). SEA-TS is different. It keeps a map of the Best Solution ever found and the Worst Solution ever found.

When the AI is stuck, it asks the Tour Guide: "Hey, look at the Best Solution over there. It used a special spice. Look at the Worst Solution here. It burned the food. How can I mix the best ideas from the winner and avoid the loser's mistakes?"
This allows the AI to jump across the map, borrowing brilliant ideas from completely different branches of its own thinking process.

4. The "Diversity Garden" (MAP-Elites Archive)

If you only plant one type of flower, your garden is boring and fragile. SEA-TS maintains a Garden of Diversity.

It forces the AI to try different "styles" of cooking: some with heavy spices (complex math), some with simple ingredients (simple logic), some using different pots (different algorithms).
Even if a specific style isn't the absolute winner right now, it's kept in the garden in case the weather changes and that style becomes the best one later.

The Result: What Did It Actually Do?

The researchers tested this "Master Chef" on predicting Solar Energy (how much power the sun will generate) and Home Electricity (how much power people will use).

The results were shocking:

It beat the experts: It predicted solar energy 40% better than the current state-of-the-art human-designed models.
It invented new things: The AI didn't just copy existing recipes. It invented new architectural patterns that humans hadn't thought of.
- Example: It created a "Monotonic Decay Head." This is a fancy way of saying the AI realized: "Hey, the sun always sets in the afternoon. Let's build a part of the brain that mathematically forces the prediction to go down smoothly after noon, just like physics says it should."
- It did this without being told about physics. It figured out the laws of nature just by trying to minimize errors.

The Bottom Line

SEA-TS is a framework where an AI acts as its own teacher, editor, and coach. It writes code, checks its own work, learns from its mistakes, remembers the best ideas, and constantly evolves. It proves that we don't just need AI to do the work; we can build AI that invents new ways to do the work better than humans ever could.

1. Problem Statement

The paper addresses the limitations of conventional Machine Learning (ML) development pipelines for time series forecasting, particularly in scenarios characterized by:

Data Scarcity: New deployment scenarios (e.g., emerging markets, rare events) often lack sufficient historical data for reliable training.
Distribution Shift: Real-world time series are non-stationary due to environmental changes, policy shifts, or equipment degradation, causing models trained on historical data to degrade rapidly.
Diminishing Returns: Manual iteration yields diminishing marginal returns as models approach performance ceilings, making further engineering efforts economically unsustainable.

While Large Language Model (LLM) agents have shown promise in automating ML engineering (MLE), existing frameworks suffer from reward hacking (generating code with logical flaws like data leakage that artificially inflate metrics), simplistic reward mechanisms (binary/fixed rewards), limited reasoning context (lack of global awareness), and static prompts that cannot adapt to discovered failure modes.

2. Methodology: The SEA-TS Framework

The authors propose SEA-TS (Self-Evolving Agent for Time Series Algorithms), a closed-loop framework that autonomously generates, validates, and optimizes forecasting code. The system operates through an iterative self-evolution loop comprising five phases:

A. Metric-Advantage Monte Carlo Tree Search (MA-MCTS)

Instead of using fixed rewards, SEA-TS employs a Metric-Advantage mechanism:

Standardization: Raw metrics ( $M_j$ $M_{j}$ ) are converted into a statistically normalized advantage score ( $A_j$ $A_{j}$ ) based on the historical distribution of metrics ( $\mu$ $μ$ and $\sigma$ $σ$ ).
- $A_j = (\mu - M_j) / \sigma$ (for minimization tasks).
Reward Logic: The final reward ( $R_j$ ) incorporates a code review flag ( $b_j$ ). If the code is logically flawed, $R_j = -1$ ; otherwise, $R_j = A_j$ .
Backpropagation: This reward updates the cumulative value ( $Q_j$ ) of ancestor nodes. This allows the search to distinguish between marginal gains and significant breakthroughs, naturally intensifying exploitation as the search converges.

B. Code Review with Running Prompt Refinement

To prevent reward hacking (e.g., data leakage):

Automated Review: Every successfully executed solution undergoes an LLM-based logical review checking for data leakage, incorrect normalization, and train-test contamination.
Dynamic Prompting: Findings from the review and global comparisons are distilled into a Running Prompt ( $P_{run}$ ). This prompt is continuously updated to encode corrective patterns (e.g., "Apply .shift(1) before rolling windows") and successful design patterns, ensuring subsequent iterations do not repeat past mistakes.

C. Global Steerable Reasoning

Unlike standard MCTS agents that only reference local parent/sibling nodes, SEA-TS compares every evaluated node against the Global Best ( $N^*$ ) and Global Worst ( $N_\perp$ ) solutions found so far.

An auxiliary LLM generates a structured comparison summary identifying successful strategies to emulate and failure patterns to avoid.
This enables cross-trajectory knowledge transfer, allowing the agent to "jump" across the search tree rather than relying solely on incremental local improvements.

D. MAP-Elites Quality-Diversity Archive

To prevent convergence to a narrow set of architectures, the framework maintains a MAP-Elites archive indexing solutions across three phenotypic dimensions:

Architecture Type: (e.g., Tree-based, Decomposition, Attention, Hybrid).
Feature Engineering Sophistication: (e.g., None vs. Extensive lags/Fourier features).
Training Sophistication: (e.g., Basic vs. Advanced scheduling/regularization).
This ensures a diverse collection of elite solutions is preserved.

3. Key Contributions

Unified Framework: A general-purpose self-evolving MLE agent combining MA-MCTS, automated code review with dynamic prompt refinement, and global steerable reasoning.
Metric-Advantage Reward: A statistically grounded reward mechanism that provides discriminative signals, significantly improving search efficiency over binary/fixed rewards.
Discovery of Novel Patterns: Demonstration that autonomous agents can discover genuinely novel architectural patterns not previously reported in literature, surpassing human-engineered baselines.

4. Experimental Results

The framework was evaluated on public benchmarks and industry proprietary datasets (Solar PV and Residential Load forecasting).

Public Benchmark (Solar-Energy):
- SEA-TS achieved a 40% reduction in MAE compared to the state-of-the-art (SOTA) model TimeMixer (1.757 vs. 2.929).
- Evolved Architecture: The agent discovered DiurnalMultiScaleGatedLinear, a six-head gated network with multi-scale decomposition and physics-informed constraints.
Proprietary Solar PV Forecasting:
- SEA-TS reduced WAPE by 8.6% compared to human-engineered baselines (17.12% vs. 25.75%).
- Novel Discovery: The agent autonomously invented a Monotonic Decay Head, which encodes the physical law that solar irradiance declines monotonically after noon, using learnable parameters and regularization. This was not prompted.
Residential Load Forecasting:
- SEA-TS reduced WAPE by 7.7% and MAPE by 3.17% compared to SOTA baselines (Timer and TimeMixer).
- Novel Discovery: The agent discovered a Learnable Hourly Bias Correction mechanism where the bias is proportional to the prediction magnitude itself, a technique not found in standard literature.

5. Significance and Future Work

Significance:
The paper demonstrates that autonomous ML engineering can move beyond mere hyperparameter tuning to generating genuinely novel algorithmic ideas. The discovery of physics-informed constraints (monotonic decay) and novel calibration techniques (magnitude-proportional bias) without explicit human prompting suggests that LLM agents can act as creative scientific partners, uncovering patterns that human experts might overlook.

Future Directions:

Multi-objective Optimization: Balancing accuracy with inference latency and model size.
Context Pruning: Reducing token consumption and API costs.
Automated Dimension Discovery: Automatically defining the MAP-Elites archive dimensions based on task characteristics.
Hybrid Agents: Integrating deep research agents (to fetch papers) with coding agents to inject domain knowledge more effectively.

In conclusion, SEA-TS represents a significant step toward fully autonomous scientific discovery in time series forecasting, proving that self-evolving agents can outperform both human engineers and existing SOTA models while generating novel, interpretable, and physically grounded solutions.