Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels

Imagine you are teaching a robot to drive a car. To do this safely, the robot needs a "3D map" of the world, created by a laser scanner called LiDAR. This scanner shoots out thousands of laser beams to see cars, pedestrians, trees, and roads.

However, there are two big problems with teaching this robot:

The "New Neighborhood" Problem (Domain Generalization): You train the robot in a sunny city in Germany. But what happens when you send it to a rainy city in London or a snowy town in Japan? The robot gets confused because the lighting, weather, and road shapes are different. It needs to learn how to drive anywhere, not just where it was trained.
The "Messy Teacher" Problem (Noisy Labels): To teach the robot, humans have to draw boxes around every object in the laser data. But humans get tired, the lasers sometimes miss things, and the data is messy. So, the robot is often taught by a teacher who makes mistakes. If you tell a student, "That's a dog," but it's actually a cat, the student gets confused. In the real world, these "mistakes" in the training data are called noisy labels.

The Paper's Big Idea

This paper says: "We need a robot that can handle both new environments and a teacher who makes mistakes."

Until now, most research focused on just one of these problems. Some tried to make the robot adapt to new cities, while others tried to fix the messy teacher. But nobody really figured out how to do both at the same time for 3D laser data.

The Solution: "DuNe" (The Dual-View Framework)

The authors created a new system called DuNe. To explain how it works, let's use a cooking analogy.

Imagine you are trying to teach a student (the AI) how to identify ingredients in a soup, but the recipe card (the label) has typos.

The Old Way: You show the student the soup exactly as it is, read the messy recipe, and say, "This is chicken." If the recipe is wrong, the student learns the wrong thing.
The DuNe Way (Dual-View): You give the student two different perspectives of the same soup:
1. The "Strong" View (The Chef's Eye): You take the soup, mix in some extra ingredients from another bowl, and rotate it. This is a very complex, detailed view. It helps the student see the shape and structure of the ingredients better, even if the recipe is messy.
2. The "Weak" View (The Casual Eye): You look at the soup simply, just as it sits in the bowl. This view is cleaner and less confusing.

How they work together:
The system forces the student to look at both views and agree on what they see.

If the "Strong" view sees a car, and the "Weak" view also sees a car, the system says, "Okay, we are confident this is a car."
If the "Strong" view sees a car but the "Weak" view is confused, the system says, "Wait, the recipe might be wrong. Let's ignore the confusing parts and focus on what both views agree on."

This "agreement" process is called consistency. It acts like a safety net. Even if the teacher (the noisy label) is wrong, the two views of the data help the robot figure out the truth by looking at the geometry and structure of the world.

What Did They Find?

The researchers tested this on three different datasets (like three different cities). They simulated a teacher who was wrong 10%, 20%, and even 50% of the time.

The Result: When the teacher was very messy (50% wrong), old methods completely failed. The robot stopped recognizing cars and started calling them trees.
DuNe's Performance: Even with a very messy teacher, DuNe kept the robot smart. It maintained high accuracy and could still drive safely in new, unseen cities.

Why This Matters

This paper is like building a super-robust training manual for self-driving cars.

It creates a new standard: They set up a "test" where robots are trained with messy data to see which ones are truly tough.
It saves money: In the real world, fixing every single mistake in a dataset costs a fortune. This method allows us to use "imperfect" data without losing performance.
It makes safety better: By teaching cars to ignore bad data and adapt to new weather or cities, we get closer to self-driving cars that won't crash just because it's raining or they are in a new country.

In short: The authors built a "dual-brain" system for self-driving cars that can learn from a messy teacher and still figure out how to drive anywhere in the world.

Here is a detailed technical summary of the paper "Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels."

1. Problem Definition

The paper addresses a critical gap in autonomous driving perception: Domain Generalization (DG) for LiDAR Semantic Segmentation under Noisy Labels (DGLSS-NL).

Context: LiDAR is essential for 3D perception, but models trained on one domain (e.g., specific sensors or weather) often fail when deployed in unseen domains.
The Challenge: Existing DG methods assume perfect ground-truth annotations. However, real-world LiDAR data often suffers from imperfect labels due to sensor occlusions, sparsity, and human annotation errors.
The Conflict: Label noise degrades segmentation accuracy, and this degradation is amplified under domain shifts. While noisy-label learning is well-studied in 2D images, it remains largely unexplored for 3D point clouds due to their sparse, irregular, and orderless nature, which makes direct transfer of 2D methods ineffective.
Goal: To develop a framework that maintains robust segmentation performance across unseen domains despite the presence of noisy supervision during training.

2. Methodology: The DuNe Framework

The authors propose DuNe (Dual-view framework for learning with Noisy labels), a novel architecture designed to handle both domain shifts and label noise simultaneously.

A. Dual-View Data Augmentation

The core of DuNe involves generating two complementary views for each input scan:

Strong View ( $P^s$ ): Created using PolarMix, a strategy that performs scene-level swapping and instance-level rotate-paste operations. This view is geometrically rich but may contain more points and potential noise artifacts.
Weak View ( $P^w$ ): Preserves the structural fidelity of the original scan with minimal augmentation.
Both views undergo Sparsity Augmentation (simulating beam-missing artifacts via row dropping) to align with the sparsity variations found in real-world LiDAR data.

B. Network Architecture

Backbone: A ResNet-based sparse convolutional network (MinkowskiEngine) processes both views.
Branches:
- Strong Branch: Encodes the strong view and generates predictions used for noise-robust supervision.
- Weak Branch: Encodes the weak view.
Inference: Only the strong branch is used during inference for efficiency.

C. Loss Functions & Training Objectives

DuNe integrates three key components to enforce consistency and robustness:

DGLSS Module (Consistency):
- Sparsity-Invariant Feature Consistency (SIFC): Aligns features between weak and strong views despite sparsity differences.
- Semantic Correlation Consistency (SCC): Enforces stable inter-class relationships (prototype alignment) across scans.
- Weighted Cross-Entropy: Handles class imbalance.
NPN Module (Noise Robustness): Adapted from image domain (NPN [12]), this module uses the strong branch's predictions to construct:
- Candidate Label Set: Includes the predicted label and partial labels.
- Complementary Label Set: All other classes.
- Loss Terms: Combines Partial Label Learning (PLL) (encouraging prediction within the candidate set) and Negative Learning (NL) (penalizing complementary classes) to mitigate overconfident noise.
Dual-View Feature Consistency ( $L_{FC}$ ): A loss term that forces the feature representations of the strong and weak branches to align at the bottleneck, ensuring the model learns invariant features.

Total Objective: $L_{total} = L_{DGLSS} + L_{NPN} + \lambda L_{FC}$

3. Key Contributions

DGLSS-NL Benchmark: The authors establish the first systematic benchmark for LiDAR semantic segmentation under noisy labels. They adapt three representative image-based noisy-label methods (TCL, DISC, NPN) to 3D point clouds on a unified backbone to serve as baselines.
Diagnostic Insights: The study reveals that existing 2D noisy-label methods (TCL, DISC) fail to transfer effectively to 3D LiDAR due to computational costs in clustering and instability in prototype construction caused by varying point counts.
DuNe Framework: A novel dual-view architecture that successfully fuses geometry-aware augmentation (PolarMix) with noise-aware supervision (NPN) and consistency regularization.
Adaptive Strategy: The paper identifies that the optimal branch usage depends on noise levels (using the strong branch for low/moderate noise and the weak branch for extreme noise to avoid amplifying label corruption).

4. Experimental Results

The method was evaluated on SemanticKITTI (Source), nuScenes, and SemanticPOSS (Target domains) under symmetric label noise ratios of 10%, 20%, and 50%.

Performance under 10% Noise:
- SemanticKITTI: 56.86% mIoU (vs. 32.99% for baseline).
- nuScenes: 42.28% mIoU.
- SemanticPOSS: 52.58% mIoU.
- Overall Metrics: Arithmetic Mean (AM) of 49.57% and Harmonic Mean (HM) of 48.50%.
Robustness at High Noise (50%):
- DuNe maintained 52.37% mIoU on SemanticKITTI, whereas the baseline collapsed to 10.86%.
- It significantly outperformed all transferred baselines (TCL, DISC, NPN) across all datasets and noise levels.
Ablation Studies:
- Adding PolarMix alone improved generalization but not enough to handle noise.
- Adding NPN alone significantly boosted noise robustness.
- The full DuNe framework (combining both with consistency losses) yielded the best results, proving the synergy between geometric augmentation and noise-robust supervision.

5. Significance

Real-World Applicability: By addressing the joint challenge of domain shift and label noise, this work moves LiDAR perception closer to real-world deployment where perfect annotations are rare and environments vary.
Methodological Advancement: It demonstrates that 2D noisy-label techniques cannot be naively transferred to 3D; specific adaptations respecting point cloud sparsity and irregularity are required.
Resource Efficiency: The proposed benchmark and framework provide a reproducible standard for future research, eliminating the need for costly re-annotation while ensuring safety-critical systems remain reliable under imperfect data conditions.

In conclusion, DuNe sets a new state-of-the-art for LiDAR semantic segmentation in noisy, cross-domain scenarios, achieving near-clean-label performance even with 10% label noise and maintaining robustness under severe (50%) noise conditions.

Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels

The Paper's Big Idea

The Solution: "DuNe" (The Dual-View Framework)

What Did They Find?

Why This Matters

1. Problem Definition

2. Methodology: The DuNe Framework

A. Dual-View Data Augmentation

B. Network Architecture

C. Loss Functions & Training Objectives

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Mitigating Instance Entanglement in Instance-Dependent Partial Label Learning

Missingness Bias Calibration in Feature Attribution Explanations

Why Is RLHF Alignment Shallow? A Gradient Analysis

Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning