Dynamic Vehicle Routing Problem with Prompt Confirmation of Advance Requests

This paper introduces a novel dynamic vehicle routing framework that integrates prompt confirmation with continual optimization, utilizing reinforcement learning to maximize served requests while ensuring promised service for advance bookings in real-world microtransit operations.

Amutheezan Sivagnanam, Ayan Mukhopadhyay, Samitha Samaranayake, Abhishek Dubey, Aron Laszka

Published 2026-03-10
📖 5 min read🧠 Deep dive

Imagine you are the manager of a fleet of small, shared taxis (like a modern, high-tech version of a school bus) that pick people up and drop them off all over a city. This is called Microtransit.

Your job is tricky because you have to deal with two conflicting needs:

  1. The Passenger: "Hey, I need a ride in 2 hours. Can you promise me right now that you'll pick me up?" They need an answer immediately.
  2. The Manager: "Wait, if I promise that ride, will I have enough time to fit it in with everyone else's rides? Maybe if I wait 10 seconds to look at the whole map, I can fit in two more people instead of just one."

The Problem: The "Yes/No" Dilemma

In the past, computer systems for these services had to choose one of two bad options:

  • Option A (The Fast Promise): They answer the passenger instantly. "Yes, you're in!" But once they say yes, they are stuck with that plan. They can't rearrange the bus seats later to make room for better trips. This leads to a lot of "No" answers later because the bus gets full too fast.
  • Option B (The Perfect Planner): They wait a long time, looking at all the requests to build the perfect bus route. This is great for efficiency, but the passenger has to wait forever for an answer. In the real world, people get impatient and leave if you don't say "Yes" or "No" quickly.

The Gap: No one had a system that could say "Yes" instantly and still keep rearranging the bus to make it even better later.

The Solution: The "Instant Promise, Continuous Rearrangement" System

The authors of this paper built a new system that acts like a super-smart, two-brained conductor.

1. Brain One: The "Quick-Thinker" (Prompt Confirmation)

When a passenger asks for a ride, this brain acts like a fast-food cashier. It looks at the current bus schedule and asks, "Can I squeeze this new order in without breaking the rules?"

  • It doesn't try to solve the whole day's puzzle. It just checks: "If I put this person here, does it fit?"
  • It gives an answer in a fraction of a second (0.2 seconds!).
  • The Magic: It uses a special "gut feeling" (trained by AI) to know that saying "Yes" to this person now won't ruin the chance of serving 10 other people later.

2. Brain Two: The "Master Planner" (Continual Optimization)

Once the passenger gets their "Yes," the Master Planner wakes up. Imagine a chess player who has just made a move. While the opponent is thinking, the chess player is already looking 10 moves ahead.

  • Between the time one passenger asks for a ride and the next one arrives, this brain is constantly shuffling the bus routes.
  • It tries to swap passengers, change pickup orders, and move buses around to make the whole system more efficient.
  • It uses a technique called "Simulated Annealing" (think of it like shaking a box of puzzle pieces to see if they fit better). It keeps shaking the puzzle until a new request comes in, at which point it stops and locks in the best arrangement it found so far.

The Secret Sauce: The "Crystal Ball" (Reinforcement Learning)

How does the "Quick-Thinker" know that saying "Yes" now is a good idea? It doesn't just look at the current bus; it looks into the future.

The authors trained the system using Reinforcement Learning.

  • The Analogy: Imagine training a dog. If the dog sits, you give it a treat. If it jumps on the couch, you say "No."
  • In this paper: The computer played a simulation game millions of times. Every time it made a decision (Accept/Reject) that led to serving more people in the long run, it got a "digital treat."
  • Over time, the computer learned a non-myopic (long-sighted) strategy. It learned that sometimes, taking a slightly "messy" route now is actually better because it saves space for a huge rush of requests coming in an hour.

The Results: Why It Matters

The team tested this on real data from a US city and New York City taxi data.

  • Speed: It answers passengers almost instantly (under 1 second).
  • Efficiency: It rejected far fewer requests than the old systems. While other systems might say "No" to 10% of people, this new system said "No" to only about 1%.

The Big Picture

Think of this system as a traffic controller for a busy airport.

  • Old systems were like controllers who either gave a landing slot immediately and never moved planes again (causing delays later), or controllers who waited 20 minutes to calculate the perfect landing sequence (making pilots wait on the runway).
  • This new system says, "You have a landing slot! Go!" (Instantly). Then, while the plane is taxiing, the controller is already reorganizing the other planes on the runway to make sure everyone lands smoothly and on time.

In short: This paper gives us a way to promise rides instantly without sacrificing the efficiency of the whole fleet, making on-demand public transport actually viable for everyday use.