Gym-TORAX: Open-source software for integrating RL with plasma control simulators

Imagine trying to conduct a symphony orchestra, but the musicians are made of super-hot gas (plasma), the instruments are giant magnetic fields, and the conductor is a computer program that has never seen a musical score before. If the conductor makes a mistake, the music stops, the gas cools down, and the experiment fails. This is the daily challenge of building a Tokamak, a machine designed to create clean, limitless energy by mimicking the sun.

For decades, controlling these machines has been like trying to balance a broom on your finger while riding a rollercoaster. It requires incredibly complex math and expert intuition. But what if we could teach a computer to learn how to balance that broom through trial and error, just like a video game character learns to jump over obstacles?

That is exactly what the paper "Gym-TORAX" is about.

The Problem: A Language Barrier

Think of the existing tools for simulating plasma physics as a high-end, professional racing simulator. It's incredibly accurate and powerful, but it's built for professional race car drivers (plasma physicists). If you are a video game developer (a Reinforcement Learning expert) who wants to build a new AI driver, you can't just plug your game controller into this simulator. The interfaces don't match, and the instructions are written in a language you don't speak.

Furthermore, many of these simulators are locked behind expensive "paywalls" or require special licenses, making it hard for new researchers to get started.

The Solution: Gym-TORAX (The "Universal Adapter")

The authors created Gym-TORAX, which acts like a universal adapter or a translator.

The Engine (TORAX): Under the hood, the software still uses the powerful, open-source "engine" called TORAX. This engine simulates the physics of the plasma—how the heat moves, how the magnetic fields shift, and how the gas behaves.
The Interface (Gymnasium): Gym-TORAX wraps this engine in a simple, standard "steering wheel and pedals" interface known as Gymnasium. This is the standard language that all modern AI learning algorithms speak.

Now, an AI researcher doesn't need to be a nuclear physicist. They just need to know how to drive a car (or play a video game). They can tell the AI: "Here is the dashboard (what the plasma looks like), here are the pedals (what controls we can touch), and here is the goal (keep the plasma stable and hot)."

How It Works: The Video Game Analogy

Imagine you are playing a video game where you control a spaceship (the plasma).

The State (Observation): The game shows you a dashboard with temperature, speed, and fuel levels.
The Action: You have a joystick. You can push it left, right, up, or down to adjust the magnetic coils or inject energy.
The Reward:
- If you keep the ship stable and fast, you get points.
- If the ship explodes or crashes, you get negative points (and the game ends).
The Learning: The AI plays the game millions of times. At first, it crashes constantly. But slowly, it learns: "Oh, when I push the joystick up too hard, the ship spins out. But if I push it gently, I get more points."

Gym-TORAX turns the complex physics of a fusion reactor into this exact kind of "video game" environment.

The "Training Wheels" Example

In the paper, the authors tested their software with a specific scenario: the ITER Ramp-Up.

The Scenario: Imagine a car starting from a stoplight and accelerating to highway speed, then cruising. In a Tokamak, this is "ramping up" the plasma from cold to super-hot.
The Test: They let three different "drivers" try this:
1. The Open-Loop Driver: Follows a pre-written script (like a GPS with no traffic updates).
2. The Random Driver: Spins the steering wheel randomly (like a toddler playing with the wheel).
3. The PI-Controller Driver: A standard, rule-based driver.
4. The Future AI: The paper sets the stage for a Reinforcement Learning AI to eventually beat all of them.

The results showed that even a simple rule-based driver could do better than the random one, and the "scripted" driver was okay, but the goal is to let an AI learn a better way to drive that no human has thought of yet.

Why This Matters

Before Gym-TORAX, if you wanted to use AI to control a fusion reactor, you had to build the entire simulation from scratch or beg for access to restricted tools. It was like trying to build a house without being allowed to buy bricks.

Gym-TORAX is open-source (free for everyone) and easy to use. It bridges the gap between two worlds:

Physicists who understand the plasma.
AI Experts who know how to train smart agents.

Now, these two groups can work together. The physicists can focus on the "physics of the car," while the AI experts focus on "teaching the car to drive itself." This collaboration could be the key to unlocking the secret of infinite, clean energy, turning the dream of a fusion-powered future into a reality.

In short: Gym-TORAX is the tool that lets us teach computers to pilot the stars, one simulation at a time.

1. Problem Statement

The optimization of stability and performance in fusion reactors, specifically tokamaks, is a critical challenge in fusion energy research. Controlling these devices is difficult due to:

High Dimensionality: The plasma state involves numerous variables (temperatures, densities, magnetic flux).
Nonlinearities: Plasma dynamics are governed by complex, nonlinear Partial Differential Equations (PDEs).
Accessibility Barriers: Existing simulators (e.g., RAPTOR, JOREK) often require restrictive licenses or are designed primarily for plasma physicists rather than control engineers. They lack standardized interfaces for Reinforcement Learning (RL), making it difficult for RL researchers to apply their algorithms to plasma control without deep domain expertise.
Limitations of Open-Source Tools: While the TORAX simulator is open-source and fast, it operates as an open-loop system (predefined inputs), lacking the closed-loop interface required for RL training.

2. Methodology

The authors developed Gym-TORAX, a Python package that bridges the gap between the TORAX physics simulator and the Gymnasium RL framework.

Core Architecture

Wrapper Design: Gym-TORAX wraps the TORAX simulator (which uses JAX for fast auto-differentiation) to create a closed-loop control environment.
MDP Formulation: The control problem is modeled as a finite-time deterministic Markov Decision Process (MDP) defined by $(\mathcal{S}, \mathcal{A}, f, r, s_0, \gamma, T)$ $(S, A, f, r, s_{0}, γ, T)$ :
- State Space ( $\mathcal{S}$ ): Includes plasma temperatures ( $T_{i,e}$ ), densities ( $n_{i,e,imp}$ ), poloidal magnetic flux ( $\psi$ ), and derived metrics like the safety factor ( $q$ ) and fusion gain ( $Q$ ).
- Action Space ( $\mathcal{A}$ ): Agents control variables such as loop voltage ( $V_{loop}$ ), total current ( $I_p$ ), and energy/particle sources (NBI, ECRH).
- Transition Function ( $f$ ): The agent selects an action, which is applied to TORAX. TORAX solves the transport equations (heat and particle transport) for $K$ internal time steps to compute the next state.
- Reward Function ( $r$ ): Task-specific (e.g., stability, power generation) and designed by the user to maximize expected return.

Implementation Details

Two-Level Discretization:
1. RL Interaction Cycle: The agent observes the state and selects an action at discrete RL time steps.
2. Physics Simulation: Each RL transition involves running the TORAX PDE solver for $K$ internal steps. Users can choose between auto (dynamic steps) or fixed (constant steps) discretization.
Environment Creation: Users extend a BaseEnv class, implementing four abstract methods:
- _get_torax_config(): Defines initial conditions and physics models.
- _define_action_space(): Specifies controllable variables and ramp-rate limits.
- _define_observation_space(): Selects observable variables (allowing for partial observability).
- _compute_reward(): Defines the objective function.
Safety Mechanisms: The system handles simulation errors or unfeasible states by terminating the episode and assigning a large negative reward ($-1000$). Actions violating constraints are clipped.

3. Key Contributions

Open-Source Framework: Gym-TORAX is the first open-source, Gymnasium-compatible interface specifically designed for tokamak plasma control, lowering the barrier to entry for RL researchers.
Standardization: It abstracts complex plasma physics behind a standard API, allowing researchers to focus on control strategy optimization rather than physics implementation.
Flexibility: The modular design allows users to easily define new scenarios (e.g., ramp-up, steady-state) and modify action/observation spaces.
Baseline Environment: The package includes a pre-implemented ITER Hybrid Ramp-Up environment, serving as a ready-to-use benchmark.

4. Results (Case Study)

The authors validated the package using the ITER Hybrid Ramp-Up scenario (100s ramp-up in L-mode, 50s nominal in H-mode). They compared three policies:

Open-Loop ( $\pi_{OL}$ ): Follows a predefined trajectory.
Random ( $\pi_{R}$ ): Selects actions uniformly at random.
PI Controller ( $\pi_{PI}$ ): Uses a Proportional-Integral controller to track a target current density, with gains optimized via grid search.

Performance Metrics (Expected Return $J$ ):

Policy	Expected Return ( $J$ )
Random ( $\pi_{R}$ )	$-10.79$
Open-Loop ( $\pi_{OL}$ )	$3.40$
PI Controller ( $\pi_{PI}$ )	$3.79$

Key Findings:

The random policy failed significantly, confirming the difficulty of the control task.
The PI controller outperformed the reference open-loop scenario by 11.5%.
The PI policy successfully increased the total current to the maximum allowable limit (15 MA), correlating with improved confinement, demonstrating that Gym-TORAX can effectively train and evaluate control strategies.

5. Significance and Future Impact

Cross-Disciplinary Collaboration: Gym-TORAX fosters collaboration between the fusion physics and machine learning communities by providing a common language (Gymnasium API).
Accelerated Research: It enables rapid prototyping of control strategies without the need for expensive hardware experiments or complex physics coding.
Scalability: While currently based on TORAX (which uses simplified 1D transport models), the framework is designed to integrate with more advanced simulators as they become available.
Future Directions: The authors plan to add tools for parameterizing tokamak geometry directly within the RL loop and handling specific physics events like the L-H transition (Low to High confinement mode), which is critical for real-world reactor performance.

In conclusion, Gym-TORAX represents a significant step toward applying advanced AI control techniques to fusion energy, providing a robust, accessible, and extensible platform for developing the next generation of plasma control systems.

Gym-TORAX: Open-source software for integrating RL with plasma control simulators

The Problem: A Language Barrier

The Solution: Gym-TORAX (The "Universal Adapter")

How It Works: The Video Game Analogy

The "Training Wheels" Example

Why This Matters

1. Problem Statement

2. Methodology

Core Architecture

Implementation Details

3. Key Contributions

4. Results (Case Study)

5. Significance and Future Impact

More like this

Convolutional Surrogate for 3D Discrete Fracture-Matrix Tensor Upscaling

Generating Counterfactual Patient Timelines from Real-World Data

LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

SIEVE: Sample-Efficient Parametric Learning from Natural Language

Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models