Excess demand in public transportation systems: The case of Pittsburgh's Port Authority

This paper proposes a framework using Poisson regression with censored data filtering to accurately estimate excess demand in public transportation systems, addressing the common issue of underestimation caused by unrecorded passengers left behind on full buses, and validates the approach using simulated data and real-world data from Pittsburgh's Port Authority.

Tianfang Ma, Robizon Khubulashvili, Sera Linardi, Konstantinos Pelechrinis

Published Wed, 11 Ma
📖 6 min read🧠 Deep dive

Imagine you are running a popular coffee shop. Every morning, a long line of people waits to buy a latte. You have a barista who can only serve 50 cups an hour.

If 60 people show up, 50 get their coffee, and 10 go home empty-handed. Now, imagine you are trying to figure out how popular your coffee shop is. You look at your sales log, and it says, "We sold 50 cups."

The Problem:
If you only look at that log, you might think, "Oh, only 50 people wanted coffee today." You wouldn't know that 10 people were actually turned away because you ran out of cups. In the world of public buses, this is exactly what happens. The bus company's computer records how many people got on the bus, but it has no way of knowing how many people were left standing at the bus stop because the bus was already full.

This paper is about figuring out how many people are being left behind at bus stops in Pittsburgh, even though the official records don't show them.

The "Censored" Mystery

The authors call this "censored data." Think of it like a movie where the screen goes black right before the hero gets shot. You know something happened, but you don't see the details.

In the bus system:

  • The Reality: 100 people want to get on. The bus holds 50.
  • The Record: The computer sees "50 people got on." It assumes only 50 people wanted to ride.
  • The Consequence: If the bus company uses this bad data to plan for the future, they might think, "Hey, we only need 50 seats!" and send a tiny bus next time. But actually, they need a big bus for 100 people. This leads to more people being left behind, and eventually, people stop taking the bus altogether because it's unreliable.

The Detective Work: How They Solved It

The researchers (Tianfang, Robizon, Sera, and Konstantinos) built a framework to act like a detective. They couldn't see the people left behind, but they could look for clues.

The Clue:
They looked at two things:

  1. How full was the bus when it arrived? (Was it packed to the brim?)
  2. How many people got on? (Did the number drop to zero or stay low?)

The Analogy:
Imagine a bus arriving at a stop.

  • Scenario A: The bus is half-empty, and nobody gets on. Conclusion: Nobody wanted to ride. (No excess demand).
  • Scenario B: The bus is half-empty, and 10 people get on. Conclusion: 10 people wanted to ride. (Normal demand).
  • Scenario C: The bus is completely full, and zero people get on. Conclusion: This is suspicious! Maybe 20 people were waiting, but the bus was too full to let them on. This is "Excess Demand."

The researchers created a filter to spot these "Suspicious Scenarios" (Scenario C). They realized that if they taught their computer model using these suspicious scenarios, the model would get confused and think, "Oh, nobody wants to ride during rush hour!" because it sees zero people getting on.

The Fix:
They told the computer: "Ignore the times when the bus was full and nobody got on. Don't use those to learn how many people usually want to ride." By removing these "censored" data points, the computer could learn the true demand.

The Simulation: Testing the Theory

Before looking at real Pittsburgh buses, they built a fake bus system in a computer (a simulation). They knew the exact truth: "We made 1,000 people wait."

They tested three ways to teach the computer:

  1. The Perfect Teacher: Showed the computer the truth (1,000 people waited).
  2. The Filtered Teacher: Showed the computer the data, but removed the "full bus" moments (like they planned to do with real data).
  3. The Naive Teacher: Showed the computer everything, including the "full bus" moments.

The Result:
The Naive Teacher failed miserably. It thought demand was low because it saw empty buses during rush hour.
The Filtered Teacher did a great job. It was almost as good as the Perfect Teacher. This proved that their method of "filtering out the full bus moments" actually works to find the hidden demand.

The Real-World Findings in Pittsburgh

They applied this method to real data from Pittsburgh's Port Authority (PPA) for a whole year. Here is what they found:

  • The "Hidden" Passengers: On average, about 1% of all passengers on the top 10 busiest routes were left behind at the stop because the bus was full.
  • Rush Hour Reality: During the morning and evening rush, that number jumps to 8%. That means on a busy morning, 1 out of every 12 people waiting for the bus might be left behind.
  • Seasonal Spikes: The problem gets worse in the Fall (when students return to school) and better in the Summer (when students are on break).
  • The "Full Bus" Myth: They found that most of the time (98%), buses arrive at stops not full. The problem is very specific to certain stops, certain times, and certain routes. It's not a city-wide disaster, but a "pinch point" issue.

Why This Matters

This isn't just about math; it's about fairness and efficiency.

  • For the City: If they know exactly where and when people are being left behind, they can send bigger buses or more frequent buses to those specific spots.
  • For the Riders: It means a more reliable system. If you know the bus won't be full, you won't waste your time waiting.
  • For the Environment: If the system is reliable, more people will choose the bus over driving their own cars, reducing traffic and pollution.

The Takeaway

The authors built a clever "X-ray" for bus data. Even though the official records are blind to the people left behind, this method allows us to see the invisible crowd. By filtering out the confusing data points where buses are full, they can predict the true demand and help Pittsburgh (and other cities) build a better, more reliable public transportation system.

As the former mayor of Bogota said, "An advanced city is not a place where the poor move about in cars, rather it's where even the rich use public transportation." But for the rich (and everyone else) to use the bus, the bus has to be able to fit them all. This paper helps make sure there's enough room.