Learning to Solve Orienteering Problem with Time Windows and Variable Profits

This paper proposes DeCoST, a learning-based two-stage framework that effectively decouples discrete and continuous variables to solve the Orienteering Problem with Time Windows and Variable Profits, achieving superior solution quality and significant inference speedups compared to state-of-the-art methods.

Songqun Gao, Zanxi Ruan, Patrick Floor, Marco Roveri, Luigi Palopoli, Daniele Fontanelli

Published 2026-03-09
📖 4 min read☕ Coffee break read

Imagine you are the manager of a fleet of delivery robots in a busy factory. Your goal is simple: get the most value done in the least amount of time.

But here's the catch:

  1. The Route: You can't visit every single machine. You have to pick the best ones to visit.
  2. The Time Windows: Some machines are only available for a few minutes (like a meeting room that's booked). If you miss the window, you can't go in.
  3. The "Variable Profit": This is the tricky part. The more time you spend fixing a machine, the more value you get. But spending too much time on one machine means you might miss out on visiting three other machines entirely.

This complex puzzle is called the Orienteering Problem with Time Windows and Variable Profits (OPTWVP). It's a nightmare for computers because they have to make two types of decisions at once:

  • Discrete decisions: "Which machines do I visit?" (Yes or No).
  • Continuous decisions: "How long do I spend at each machine?" (1.2 minutes? 4.5 minutes?).

Most computer programs struggle with this because they try to solve the "which" and the "how long" at the same time, getting tangled in a mess of possibilities.

Enter DeCoST: The "Two-Step Dance"

The authors of this paper propose a new method called DeCoST (Decoupled discrete-Continuous optimization with Service-time-guided Trajectory). Think of DeCoST not as a single brain trying to do everything, but as a two-step dance between a "Strategist" and a "Tactician."

Step 1: The Strategist (The "Rough Draft")

First, the AI acts like a Strategist. It looks at the map and quickly sketches out a route.

  • It picks the machines to visit.
  • It guesses a rough amount of time to spend at each one.

Crucially, this Strategist is trained to be smart. It doesn't just guess randomly; it learns from experience (using a technique called Reinforcement Learning) to understand that spending too much time here might mean missing a huge opportunity there. It creates a "feasible" plan that respects the time windows.

Step 2: The Tactician (The "Fine-Tuner")

Once the route is drawn, the AI hands the plan to the Tactician.

  • The Tactician doesn't change the route. The "which machines" decision is locked in.
  • Instead, the Tactician solves a pure math problem (Linear Programming) to figure out the perfect amount of time to spend at each machine to maximize the total reward.

Why is this brilliant?
It's like writing a novel.

  • Old way: You try to write the plot, the dialogue, the character arcs, and the grammar all in one sentence. It's chaotic.
  • DeCoST way: First, you write the plot outline (Step 1). Then, you go back and polish the sentences and dialogue to perfection (Step 2). By separating the "big picture" from the "details," the computer solves the problem much faster and better.

The Secret Sauce: "The Repulsive Coach"

There's a clever trick in Step 1. Usually, if you teach a computer to solve a math problem, it gets lazy and just copies the answer from the math solver (Step 2). This is bad because it stops learning how to make good guesses.

The authors added a "Repulsive Coach" (called pTAR).

  • Imagine a coach who yells, "Don't just copy the final answer! Try to guess the spirit of the answer!"
  • This coach pushes the Strategist to make guesses that are different from the perfect math solution, forcing it to learn the underlying patterns of the problem. This ensures the Strategist gets really good at making the initial route, not just copying the math.

The Results: Fast and Accurate

The paper tested DeCoST against the best existing methods (both human-designed algorithms and other AI models).

  • Speed: DeCoST is incredibly fast. For a problem with 500 machines, it found a great solution in 1.3 seconds. The best competing method took 8.8 seconds (and that was with a time limit!). That's a 6x speedup.
  • Quality: It found better routes than the others, getting closer to the theoretical "perfect" score.

The Bottom Line

DeCoST is like a master chef who knows that you can't chop vegetables and bake a cake at the exact same time.

  1. First, you prep the ingredients and decide the menu (The Route).
  2. Then, you focus entirely on the perfect cooking time for each dish (The Service Time).

By separating these tasks but keeping them connected, DeCoST solves a very difficult logistics problem faster and more accurately than ever before, making it perfect for real-world applications like robot factories, delivery drones, and emergency response teams.