Velocity and stroke rate reconstruction of canoe sprint team boats based on panned and zoomed video recordings

This paper presents an automated video-based framework that leverages YOLOv8, U-net calibration, and optical flow to accurately reconstruct canoe sprint team boats' velocity and stroke rate from panned and zoomed recordings, achieving high agreement with GPS data without requiring on-boat sensors.

Julian Ziegler, Daniel Matthes, Finn Gerdts, Patrick Frenzel, Torsten Warnke, Matthias Englert, Tina Koevari, Mirco Fuchs

Published 2026-02-27
📖 5 min read🧠 Deep dive

Imagine you are watching a high-speed canoe race on TV. The boats are moving incredibly fast, the water is splashing, and the camera is zooming and panning to follow the action. As a coach or a fan, you want to know two things: How fast are they going? and How hard are they paddling?

Usually, to get this data, you'd need to strap a GPS tracker and a motion sensor onto every single boat. But that's expensive, logistically a nightmare, and sometimes against the rules.

This paper introduces a "magic trick" that lets you get that same super-accurate data just by watching the video footage. It's like having a super-powered computer vision system that can look at a shaky, zoomed-in video and say, "Ah, that boat is exactly 4.2 meters per second, and they are paddling 110 times a minute."

Here is how they did it, broken down into simple concepts:

1. The Problem: The "Fish-Eye" Distortion

When a camera on the shore films a race, it doesn't look like a flat map. It looks like a distorted, angled view. If you just measure how many pixels a boat moves in the video, you get the wrong speed because the camera is zooming in and out and moving side-to-side.

The Analogy: Imagine watching a race from a Ferris wheel. As the wheel turns, the runners look like they are speeding up or slowing down just because of your angle, even if they are running at a constant speed. The researchers needed a way to "flatten" that Ferris wheel view into a perfect, flat map.

2. The Solution: The "Virtual Grid"

The race course has buoys (floating markers) spaced out perfectly every 25 meters. The researchers taught their computer to spot these buoys and the athletes.

  • The Trick: Because the computer knows exactly where the buoys should be in real life, it can calculate the camera's angle and distortion for every single frame of the video. It essentially draws a "virtual grid" over the video, turning the messy footage into a precise 3D map.

3. The New Challenge: Team Boats (The "Tetris" Problem)

Previous versions of this tech only worked for single-person boats. But what about boats with 2 or 4 people?

  • The Issue: In a 4-person boat, the athletes are sitting in a line. Sometimes the camera angle hides the person in the back, or they overlap. If the computer loses track of who is who, it can't figure out where the front of the boat is.
  • The Fix: They built a special "AI Detective" (called a U-Net) that acts like a super-precise ruler. Instead of guessing where the boat tip is based on a rough average, this AI looks at a tiny patch of the video and says, "I see the very tip of the boat right here." It learns the specific shape of the boat tip for every single race.

4. Keeping the Team Together: The "Optical Flow" Rope

What if the computer misses a frame where an athlete is hidden by a splash?

  • The Fix: They used a technique called "Optical Flow." Imagine the athletes are tied together with an invisible rope. If the computer sees the person in the front, it knows exactly where the person in the back must be, even if they are temporarily hidden. It tracks their movement like a dance partner, ensuring the order of the team never gets mixed up.

5. Counting the Strokes: The "Heartbeat" of the Race

Finally, they needed to count how many times the paddles hit the water (Stroke Rate).

  • Method A (The Simple Way): They looked at the brightness changes inside the box surrounding the athlete. When the paddle moves, the pixels change brightness. It's like listening to a drumbeat by watching the drum skin vibrate.
  • Method B (The Smart Way): They used a pose-estimation AI (ViTPose) to find the athlete's shoulder and wrist. It calculates the distance between them. As the athlete paddles, this distance stretches and shrinks rhythmically.
  • The Result: The "Smart Way" (Method B) was much more accurate, almost as good as having a sensor on the athlete's wrist.

Why This Matters

This system is a game-changer for coaches and athletes because:

  1. No Sensors Needed: You don't need to buy expensive gear or worry about regulations. Just film the race.
  2. Instant Feedback: Coaches can see exactly how the race was paced (fast start? slow finish?) and how the team synchronized their paddling.
  3. Universal: It works for single boats, double boats, and four-person boats, whether it's a 200m sprint or a 500m race.

In a nutshell: The researchers turned a standard video camera into a high-tech motion lab. They taught a computer to understand the geometry of a race, track a team of paddlers even when they hide from the camera, and count their strokes just by watching the pixels dance. It's like giving a coach X-ray vision, but using only a smartphone and some clever math.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →