Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem

This paper proposes RL-CMSA, a hybrid algorithm that integrates reinforcement learning-guided probabilistic clustering with exact MILP optimization and local search to effectively solve the min-max Multiple Traveling Salesman Problem, demonstrating superior performance over state-of-the-art methods on large-scale instances.

Guillem Rodríguez-Corominas, Maria J. Blesa, Christian Blum

Published 2026-03-02
📖 5 min read🧠 Deep dive

Imagine you are the manager of a fleet of delivery trucks. You have a central warehouse (the depot) and hundreds of customers scattered across a city. Your goal isn't just to get everything delivered; it's to make sure no single driver is overworked. You want to minimize the time the longest route takes, ensuring a fair balance for everyone. This is the Min-Max Multiple Traveling Salesman Problem.

The paper introduces a new, smart way to solve this puzzle called RL-CMSA. It's a hybrid method that mixes old-school math with modern "learning" AI. Here is how it works, explained through a simple story.

The Problem: The "Fairness" Puzzle

Imagine you have 10 drivers and 500 houses to visit. If you just let them drive randomly, one driver might end up with a 10-hour shift while another finishes in 2 hours. That's bad. You need to split the city into 10 perfect loops so that the longest loop is as short as possible.

Doing this by hand is impossible. Even computers struggle because the number of ways to arrange those routes is astronomical (like trying to find a specific grain of sand in all the beaches on Earth).

The Solution: The "Smart Workshop" (RL-CMSA)

The authors built a digital workshop that solves this problem in four repeating steps. Think of it as a team of chefs trying to create the perfect menu, but they keep learning from their mistakes.

1. Construct: The "Clustering Chef"

First, the computer tries to guess how to group the houses.

  • The Old Way: Just throw dice to decide which house goes to which driver.
  • The New Way (RL-CMSA): The computer uses a "memory bank" (Reinforcement Learning). It remembers which pairs of houses usually end up in the same route in good solutions.
    • Analogy: Imagine a chef who knows that "Pizza" and "Garlic Bread" always go well together. If the chef sees a new order, they automatically put those two items in the same box. The computer does this with cities, using "Q-values" (a score of how good a pair is) to guide the grouping.

2. Merge: The "Recipe Book"

The computer generates many different groupings (like trying 20 different ways to split the city). It takes the best routes from all these attempts and puts them into a Pool (a recipe book).

  • It's picky: If a route is too long or visits the same houses as a better route already in the book, it gets thrown out. The pool stays compact and high-quality.

3. Solve: The "Mathematical Architect"

Now, the computer stops guessing and starts using a powerful, exact math solver (like a super-precise architect).

  • It looks at the "Recipe Book" (the pool of good routes) and asks: "If I pick exactly 10 routes from this book, which combination covers every house exactly once and keeps the longest route the shortest?"
  • This is a strict math problem (MILP) that guarantees the best possible combination from the options available.

4. Adapt & Learn: The "Coach"

This is the magic part. After the math solver finds a great solution, the computer acts like a coach:

  • Adapt: It keeps the good routes in the pool and deletes the old, bad ones (like pruning a garden).
  • Learn: It looks at the winning solution and says, "Hey, these two cities were neighbors in the winner! Let's give them a high score so we group them together next time."
  • If a pair of cities was never together in a good solution, the computer lowers their score.

Why is this better than the competition?

The paper compares this new method (RL-CMSA) against the current "champion" algorithm (a Genetic Algorithm, which is like evolution: it breeds solutions and keeps the fittest).

  • The Genetic Algorithm (HGA): It's like a wild explorer. It tries many different paths, but it can get lost or wander aimlessly. It finds good solutions, but sometimes it takes a long time and isn't consistent.
  • RL-CMSA: It's like a guided tour. Because it "learns" from its own successes, it knows exactly where to look.
    • Result: On big, difficult problems (lots of cities and many drivers), RL-CMSA finds better, fairer routes faster. It is more consistent, meaning if you run it 40 times, it finds the best answer almost every time, whereas the old method might only find it a few times.

The One Weakness

The new method shines when there are many drivers (many short routes). When there are very few drivers (meaning each driver has a huge, long route), the "Mathematical Architect" step gets a bit stuck because there are fewer ways to mix and match the pieces. But for most real-world scenarios (like delivery fleets with many trucks), the new method is the clear winner.

In a Nutshell

The paper presents a system that builds potential solutions, merges the best parts, solves the math perfectly, and learns from the results to get smarter every time. It's a team of a creative builder, a strict mathematician, and a smart coach working together to ensure every delivery driver gets a fair, efficient shift.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →