UNet-Based Keypoint Regression for 3D Cone Localization in Autonomous Racing

This paper presents a UNet-based neural network trained on a large custom-labeled dataset to achieve accurate, real-time 3D cone localization and color prediction, demonstrating superior performance over traditional methods and validating its effectiveness in end-to-end autonomous racing systems.

Mariia Baidachna, James Carty, Aidan Ferguson, Joseph Agrane, Varad Kulkarni, Aubrey Agub, Michael Baxendale, Aaron David, Rachel Horton, Elliott Atkinson

Published 2026-02-26
📖 5 min read🧠 Deep dive

Imagine you are teaching a robot car how to race around a track made entirely of traffic cones. The goal is simple: drive fast without hitting the cones. But for the car, this is incredibly hard. The cones are small, they get dirty, the lighting changes, and the car is moving at high speeds. If the car can't see exactly where the cones are, it will crash or drive too slowly.

This paper is about building a "super-eye" for that robot car to spot the cones perfectly. Here is the story of how they did it, explained simply.

1. The Problem: The "Needle in a Haystack"

In a normal race, the track is painted on the ground. In autonomous racing (like Formula Student), the track is defined by blue cones on the left and yellow cones on the right.

The car needs to know two things instantly:

  1. Where the cone is in 3D space (how far away and to the side).
  2. What color it is (blue or yellow).

Old methods were like trying to find a specific grain of sand on a beach using a magnifying glass. They worked okay in perfect weather, but if a cone was muddy, cracked, or the sun was glaring, the computer got confused. Newer methods using deep learning were better but often too slow to run on the car's computer while driving fast.

2. The Solution: A "Digital X-Ray" (The UNet)

The researchers built a new AI model based on something called a UNet.

Think of a standard camera as just taking a photo. A UNet is like a digital X-ray that doesn't just see the cone; it sees the skeleton of the cone.

  • Instead of just saying "There is a cone here," the AI looks at the cone and marks six specific points on it (like the top corners, the bottom corners, and the middle of the stripe).
  • Imagine you are drawing a stick figure on a cone. The AI learns to draw that stick figure perfectly, even if the cone is dirty or half-hidden.

3. The Secret Sauce: A Massive Training Library

To teach this AI, you need a lot of practice. The researchers created the largest dataset of its kind:

  • They took 25,000 photos of cones from every angle imaginable.
  • They manually labeled the "stick figure" points on every single cone.
  • They used this massive library to train the AI until it became an expert at spotting those six points, no matter how messy the cone looked.

4. How It Works in Real Life (The 3D Magic)

Once the AI spots the six points on the cone in the camera image, how does it know how far away the cone is?

The car has two cameras (like human eyes).

  1. The AI finds the six points on the cone in the left camera and the six points in the right camera.
  2. It measures the tiny difference between where the points appear in the left eye versus the right eye (this is called disparity).
  3. Just like your brain uses the difference between your two eyes to judge depth, the car uses this math to calculate exactly how far away the cone is in 3D space.

Because the AI is so good at finding those six points, the 3D calculation is incredibly accurate.

5. The Bonus: Reading the Color

Since the AI marks the specific "stripe" on the cone, it can also easily tell if that stripe is black or white, which helps confirm if the cone is blue or yellow. It's like the AI doesn't just see a blob of color; it sees the pattern on the cone.

6. The Results: Fast and Accurate

The team tested this on a real racing car simulator and on the actual car hardware.

  • Accuracy: It was much better than previous methods. It made fewer mistakes, even when cones were dirty or partially hidden.
  • Speed: They were worried the AI would be too slow for a racing car. They found that while it did use a little more computer power, it was still fast enough to run in real-time. It's like adding a turbocharger to the car's brain; it uses a bit more fuel (electricity), but the car drives much safer and faster.

The Big Picture

This paper shows that by teaching a robot to look at the specific details of an object (the key points) rather than just the general shape, we can make autonomous cars much safer and faster.

The Analogy:
Imagine playing a game of "Pin the Tail on the Donkey" while blindfolded, but you have a friend whispering exactly where the tail should go.

  • Old methods: The friend guesses vaguely ("It's somewhere near the middle").
  • This new method: The friend says, "It's exactly 2 inches left and 1 inch up."
  • The result: The car doesn't just guess where the track is; it knows exactly where the track is, allowing it to race at top speed without crashing.

This technology is a big step forward for making self-driving race cars that can compete with human drivers.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →