Gym-TORAX: Open-source software for integrating RL with plasma control simulators

This paper introduces Gym-TORAX, an open-source Python package that bridges Reinforcement Learning algorithms with the TORAX plasma control simulator by automatically generating Gymnasium-compatible environments for optimizing tokamak performance and stability, currently featuring an ITER ramp-up scenario.

Antoine Mouchamps, Arthur Malherbe, Adrien Bolland, Damien Ernst

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine trying to conduct a symphony orchestra, but the musicians are made of super-hot gas (plasma), the instruments are giant magnetic fields, and the conductor is a computer program that has never seen a musical score before. If the conductor makes a mistake, the music stops, the gas cools down, and the experiment fails. This is the daily challenge of building a Tokamak, a machine designed to create clean, limitless energy by mimicking the sun.

For decades, controlling these machines has been like trying to balance a broom on your finger while riding a rollercoaster. It requires incredibly complex math and expert intuition. But what if we could teach a computer to learn how to balance that broom through trial and error, just like a video game character learns to jump over obstacles?

That is exactly what the paper "Gym-TORAX" is about.

The Problem: A Language Barrier

Think of the existing tools for simulating plasma physics as a high-end, professional racing simulator. It's incredibly accurate and powerful, but it's built for professional race car drivers (plasma physicists). If you are a video game developer (a Reinforcement Learning expert) who wants to build a new AI driver, you can't just plug your game controller into this simulator. The interfaces don't match, and the instructions are written in a language you don't speak.

Furthermore, many of these simulators are locked behind expensive "paywalls" or require special licenses, making it hard for new researchers to get started.

The Solution: Gym-TORAX (The "Universal Adapter")

The authors created Gym-TORAX, which acts like a universal adapter or a translator.

  1. The Engine (TORAX): Under the hood, the software still uses the powerful, open-source "engine" called TORAX. This engine simulates the physics of the plasma—how the heat moves, how the magnetic fields shift, and how the gas behaves.
  2. The Interface (Gymnasium): Gym-TORAX wraps this engine in a simple, standard "steering wheel and pedals" interface known as Gymnasium. This is the standard language that all modern AI learning algorithms speak.

Now, an AI researcher doesn't need to be a nuclear physicist. They just need to know how to drive a car (or play a video game). They can tell the AI: "Here is the dashboard (what the plasma looks like), here are the pedals (what controls we can touch), and here is the goal (keep the plasma stable and hot)."

How It Works: The Video Game Analogy

Imagine you are playing a video game where you control a spaceship (the plasma).

  • The State (Observation): The game shows you a dashboard with temperature, speed, and fuel levels.
  • The Action: You have a joystick. You can push it left, right, up, or down to adjust the magnetic coils or inject energy.
  • The Reward:
    • If you keep the ship stable and fast, you get points.
    • If the ship explodes or crashes, you get negative points (and the game ends).
  • The Learning: The AI plays the game millions of times. At first, it crashes constantly. But slowly, it learns: "Oh, when I push the joystick up too hard, the ship spins out. But if I push it gently, I get more points."

Gym-TORAX turns the complex physics of a fusion reactor into this exact kind of "video game" environment.

The "Training Wheels" Example

In the paper, the authors tested their software with a specific scenario: the ITER Ramp-Up.

  • The Scenario: Imagine a car starting from a stoplight and accelerating to highway speed, then cruising. In a Tokamak, this is "ramping up" the plasma from cold to super-hot.
  • The Test: They let three different "drivers" try this:
    1. The Open-Loop Driver: Follows a pre-written script (like a GPS with no traffic updates).
    2. The Random Driver: Spins the steering wheel randomly (like a toddler playing with the wheel).
    3. The PI-Controller Driver: A standard, rule-based driver.
    4. The Future AI: The paper sets the stage for a Reinforcement Learning AI to eventually beat all of them.

The results showed that even a simple rule-based driver could do better than the random one, and the "scripted" driver was okay, but the goal is to let an AI learn a better way to drive that no human has thought of yet.

Why This Matters

Before Gym-TORAX, if you wanted to use AI to control a fusion reactor, you had to build the entire simulation from scratch or beg for access to restricted tools. It was like trying to build a house without being allowed to buy bricks.

Gym-TORAX is open-source (free for everyone) and easy to use. It bridges the gap between two worlds:

  • Physicists who understand the plasma.
  • AI Experts who know how to train smart agents.

Now, these two groups can work together. The physicists can focus on the "physics of the car," while the AI experts focus on "teaching the car to drive itself." This collaboration could be the key to unlocking the secret of infinite, clean energy, turning the dream of a fusion-powered future into a reality.

In short: Gym-TORAX is the tool that lets us teach computers to pilot the stars, one simulation at a time.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →