Phys-3D: Physics-Constrained Real-Time Crowd Tracking and Counting on Railway Platforms

This paper presents Phys-3D, a real-time physics-constrained tracking framework that integrates a transfer-learned YOLOv11m detector with a 3D motion model to achieve robust crowd counting on railway platforms despite challenges like camera motion, occlusions, and perspective distortions.

Bin Zeng, Johannes Künzel, Anna Hilsmann, Peter Eisert

Published 2026-02-27
📖 5 min read🧠 Deep dive

Imagine you are sitting on a train as it slowly pulls into a busy station. The platform is packed with people. Your job is to count exactly how many people are waiting there, in real-time, without missing anyone or counting the same person twice.

This sounds easy, but doing it with a camera on a moving train is a nightmare for computers. Here is why:

  • The Train is Moving: As the train approaches, the people on the platform look like they are zooming toward the camera, even if they are standing still. It's like looking out a car window; the trees seem to rush past, but they aren't moving.
  • The Crowd is Dense: People are shoulder-to-shoulder. Their bodies block each other, making it hard to see where one person ends and another begins.
  • The Perspective is Weird: People far away look tiny, and people close up look huge. A standard computer program gets confused by this rapid size change.

The paper "Phys-3D" proposes a clever solution to this problem. Instead of just using a "smart camera," they built a system that understands physics and geometry.

Here is how their system works, broken down into simple parts:

1. The Detective: "Head Hunting"

Most security cameras try to spot a whole person (head to toe). But on a crowded platform, legs and torsos get hidden behind other people.

  • The Analogy: Imagine trying to spot a flock of birds in a tree. If you look for the whole bird, the branches hide them. But if you just look for the heads, they are usually visible above the branches.
  • The Solution: The system ignores bodies and only looks for heads. It uses a super-smart AI (a detector called YOLOv11m) trained specifically to find heads even when they are squished together or blurry.

2. The Tracker: The "Physics Coach"

Once the camera spots a head, it needs to follow that person as the train moves.

  • The Problem: Standard tracking software assumes the camera is standing still. When the train moves, the software gets confused. It thinks the people are running toward the train because they are getting bigger in the picture. It loses track of them or swaps their identities (thinking Person A is now Person B).
  • The Solution (Phys-3D): The authors created a new tracker called Phys-3D. Think of this tracker as a physics coach who knows how trains work.
    • Instead of just watching the 2D picture on the screen, the coach imagines the people in 3D space.
    • It knows: "The train is slowing down. The people are standing still. Therefore, if they look like they are zooming toward us, it's because we are moving, not them."
    • By applying the laws of physics (like how a pinhole camera works), it separates the motion of the train from the motion of the people. This keeps the "ID tag" on each person stable, even when they are hidden for a second.

3. The Counter: The "Virtual Hallway"

Even with a good tracker, counting is tricky. If a person is blocked by a pole for a split second, a simple counter might think they left and then re-entered, counting them twice.

  • The Analogy: Imagine a bouncer at a club. If you just stand at the door, you might miss someone who steps back inside. But if you have a small hallway (a "virtual counting band") inside the club, you only count someone when they stay in that hallway for a few seconds.
  • The Solution: The system creates invisible "zones" on the platform. A person is only counted if they stay in that zone for a few frames (moments). This filters out the "jitter" and ensures that if someone is briefly hidden, they aren't counted twice.

The Results: Why Does This Matter?

The team tested this on a real dataset of train platforms (which they created because no good one existed).

  • Accuracy: Their system made a counting error of only 2.97%. That means if there are 100 people, they are almost always right.
  • Speed: It runs in real-time, meaning it can be used on the train while it is arriving.

The Big Picture

This isn't just about counting heads. It's about safety and efficiency.

  • Safety: If the platform is too crowded, the train can be delayed to let people off first, preventing accidents.
  • Efficiency: Station managers can see exactly how many people are waiting and decide if they need to send a bigger train or more staff.

In summary: The paper teaches a computer how to be a smart observer on a moving train. By focusing on heads, understanding the physics of the train's movement, and using a "waiting zone" to count, it solves a problem that has been too messy for computers to handle until now. It turns a chaotic, blurry video into a precise, reliable number.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →