Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels

This paper addresses the challenge of LiDAR-based 3D semantic segmentation under noisy labels and domain shifts by introducing the DGLSS-NL task, establishing a new benchmark, and proposing DuNe, a dual-view framework that achieves state-of-the-art robustness across multiple datasets.

Weitong Kong, Zichao Zeng, Di Wen, Jiale Wei, Kunyu Peng, June Moh Goo, Jan Boehm, Rainer Stiefelhagen

Published Wed, 11 Ma
📖 4 min read☕ Coffee break read

Imagine you are teaching a robot to drive a car. To do this safely, the robot needs a "3D map" of the world, created by a laser scanner called LiDAR. This scanner shoots out thousands of laser beams to see cars, pedestrians, trees, and roads.

However, there are two big problems with teaching this robot:

  1. The "New Neighborhood" Problem (Domain Generalization): You train the robot in a sunny city in Germany. But what happens when you send it to a rainy city in London or a snowy town in Japan? The robot gets confused because the lighting, weather, and road shapes are different. It needs to learn how to drive anywhere, not just where it was trained.
  2. The "Messy Teacher" Problem (Noisy Labels): To teach the robot, humans have to draw boxes around every object in the laser data. But humans get tired, the lasers sometimes miss things, and the data is messy. So, the robot is often taught by a teacher who makes mistakes. If you tell a student, "That's a dog," but it's actually a cat, the student gets confused. In the real world, these "mistakes" in the training data are called noisy labels.

The Paper's Big Idea

This paper says: "We need a robot that can handle both new environments and a teacher who makes mistakes."

Until now, most research focused on just one of these problems. Some tried to make the robot adapt to new cities, while others tried to fix the messy teacher. But nobody really figured out how to do both at the same time for 3D laser data.

The Solution: "DuNe" (The Dual-View Framework)

The authors created a new system called DuNe. To explain how it works, let's use a cooking analogy.

Imagine you are trying to teach a student (the AI) how to identify ingredients in a soup, but the recipe card (the label) has typos.

  • The Old Way: You show the student the soup exactly as it is, read the messy recipe, and say, "This is chicken." If the recipe is wrong, the student learns the wrong thing.
  • The DuNe Way (Dual-View): You give the student two different perspectives of the same soup:
    1. The "Strong" View (The Chef's Eye): You take the soup, mix in some extra ingredients from another bowl, and rotate it. This is a very complex, detailed view. It helps the student see the shape and structure of the ingredients better, even if the recipe is messy.
    2. The "Weak" View (The Casual Eye): You look at the soup simply, just as it sits in the bowl. This view is cleaner and less confusing.

How they work together:
The system forces the student to look at both views and agree on what they see.

  • If the "Strong" view sees a car, and the "Weak" view also sees a car, the system says, "Okay, we are confident this is a car."
  • If the "Strong" view sees a car but the "Weak" view is confused, the system says, "Wait, the recipe might be wrong. Let's ignore the confusing parts and focus on what both views agree on."

This "agreement" process is called consistency. It acts like a safety net. Even if the teacher (the noisy label) is wrong, the two views of the data help the robot figure out the truth by looking at the geometry and structure of the world.

What Did They Find?

The researchers tested this on three different datasets (like three different cities). They simulated a teacher who was wrong 10%, 20%, and even 50% of the time.

  • The Result: When the teacher was very messy (50% wrong), old methods completely failed. The robot stopped recognizing cars and started calling them trees.
  • DuNe's Performance: Even with a very messy teacher, DuNe kept the robot smart. It maintained high accuracy and could still drive safely in new, unseen cities.

Why This Matters

This paper is like building a super-robust training manual for self-driving cars.

  1. It creates a new standard: They set up a "test" where robots are trained with messy data to see which ones are truly tough.
  2. It saves money: In the real world, fixing every single mistake in a dataset costs a fortune. This method allows us to use "imperfect" data without losing performance.
  3. It makes safety better: By teaching cars to ignore bad data and adapt to new weather or cities, we get closer to self-driving cars that won't crash just because it's raining or they are in a new country.

In short: The authors built a "dual-brain" system for self-driving cars that can learn from a messy teacher and still figure out how to drive anywhere in the world.