Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling

This paper proposes a unified co-elliptic maneuver framework for multi-debris removal in Low Earth Orbit and demonstrates through comparative analysis that a Masked Proximal Policy Optimization deep reinforcement learning approach significantly outperforms Greedy heuristics and Monte Carlo Tree Search in mission efficiency and computational speed.

Original authors: Agni Bandyopadhyay, Gunther Waxenegger-Wilfing

Published 2026-02-23
📖 5 min read🧠 Deep dive

This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine the Low Earth Orbit (LEO) around our planet as a massive, chaotic highway. Instead of cars, this highway is filled with thousands of pieces of space junk—dead satellites, broken rocket parts, and tiny metal shards. If left alone, these pieces could crash into each other, creating a domino effect of explosions that makes space travel impossible for decades. This is known as the "Kessler Syndrome."

To stop this, we need "Space Janitors": special spacecraft designed to fly out, grab these pieces of junk, and drag them down to burn up in the atmosphere. But here's the problem: There are too many pieces of junk, and the Space Janitor has limited fuel and time.

This paper is about teaching a Space Janitor how to be the most efficient cleaner possible using Artificial Intelligence (AI).

The Challenge: The Ultimate Road Trip

Imagine you are a delivery driver in a giant city. You have a truck with a limited gas tank, and you need to drop off packages at 50 different houses scattered across the city.

  • The Goal: Visit as many houses as possible before you run out of gas or time.
  • The Catch: You can't just drive straight to the next house. You have to follow specific traffic rules (orbital mechanics), sometimes you need to stop at a gas station (refueling), and you must avoid crashing into other cars (safety zones).

The paper compares three different "drivers" (algorithms) to see who can clean up the most trash:

  1. The "Greedy" Driver: This driver looks only at the house right next door. They pick the closest one, go there, then look for the next closest one. They don't think about the future.
    • Result: They are fast, but they often get stuck in a corner or run out of gas because they didn't plan the route ahead.
  2. The "Super-Planner" (MCTS): This driver sits down and simulates millions of different possible routes in their head before making a single move. They think, "If I go here, then there, then maybe I should gas up..."
    • Result: They find a great route, but it takes them so long to think that by the time they decide where to go, the mission time is almost up. They are too slow for real-time use.
  3. The "AI Learner" (Masked PPO): This is the star of the show. This driver has been trained by playing thousands of virtual versions of this game. They don't just look at the next house; they have "learned" the patterns of the city. They know when to take a shortcut, when to refuel, and how to chain trips together efficiently.
    • Result: They are almost as smart as the Super-Planner but move as fast as the Greedy driver.

The Secret Sauce: How They Move

The paper introduces a special way of moving called "Co-Elliptic Transfers."

Think of it like a race car on a track. If you want to catch a car in the next lane, you don't just swerve wildly (which wastes gas). Instead, you speed up or slow down slightly to get into a "shadow lane" (a safety ellipse) that runs parallel to the target. You drift along this lane until you are right next to the target, then gently merge.

The AI also uses "Safety Ellipses." Imagine approaching a fragile vase. You don't just grab it; you circle it slowly in a safe oval path to make sure you don't knock it over. This paper teaches the AI to do this with space junk, ensuring it doesn't accidentally crash into the debris it's trying to clean up.

The Results: Who Won?

The researchers ran 100 different "cleaning missions" with random junk locations. Here is what happened:

  • The Greedy Driver cleaned up about 15–18 pieces of junk. They were too short-sighted.
  • The Super-Planner cleaned up about 25–29 pieces. They were smart but took forever to think (sometimes hours to plan a 7-day mission).
  • The AI Learner cleaned up 29–32 pieces. They were the most efficient!

The Big Win: The AI Learner visited twice as many pieces of junk as the simple Greedy driver, and it did it in just 1 or 2 seconds of computer time. The Super-Planner took thousands of seconds to do a slightly worse job.

Why This Matters

Space is getting crowded, and we can't afford to waste fuel or time. This paper proves that Deep Reinforcement Learning (a type of AI that learns by trial and error) is the future of space cleanup.

It's like upgrading from a human driver who gets tired and confused, to a self-driving car that has "seen" every possible traffic jam before it even happens. This technology could soon allow autonomous spacecraft to clean up our orbit, keeping space safe for future generations without needing a human to press every button.

In a nutshell: The paper teaches a robot how to be the ultimate space janitor, cleaning up the most trash in the least amount of time by learning from experience rather than just guessing or over-thinking.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →