STONE Dataset: A Scalable Multi-Modal Surround-View 3D Traversability Dataset for Off-Road Robot Navigation

Imagine you are teaching a robot to drive a car, but instead of on a smooth highway with clear white lines, you are sending it into a wild, untamed forest. There are no road signs, no painted lanes, and the ground changes from soft mud to sharp rocks, tall grass, and deep puddles. How does the robot know where it can drive and where it will get stuck?

This is the problem the STONE dataset solves. Think of STONE not just as a collection of data, but as a massive, super-smart training manual for off-road robots.

Here is the breakdown of what makes this paper special, explained with some everyday analogies:

1. The Problem: The "Blind" Robot

Most robots today are like drivers who only have a windshield. They look straight ahead. But in the wild, you need to see everything: behind you, to your sides, and above you.

The Old Way: Previous datasets were like giving a robot a single, low-quality photo of the road ahead. They often missed obstacles on the side or couldn't see through fog and rain.
The STONE Solution: STONE gives the robot a 360-degree "God's Eye View." It's like equipping the robot with:
- Six high-definition eyes (Cameras): To see colors and textures.
- One super-accurate laser scanner (LiDAR): To measure exact distances and shapes, even in the dark.
- Three "X-ray" eyes (4D Radars): These are the secret sauce. Just like radar on a ship sees through fog, these radars let the robot "see" through rain, dust, and darkness where cameras fail.

2. The Magic Trick: No Human Labeling Needed

Usually, to teach a robot what is "drivable," humans have to sit for hours drawing lines on thousands of photos saying, "Here is grass (good)," and "Here is a rock (bad)." This is slow, expensive, and impossible to scale.

STONE uses a smart, automated detective instead of human painters. Here is how it works:

The "Safe Path" Clue: The researchers drove the robot through the terrain. Wherever the robot successfully drove, the system marked that area as "Safe."
The "Geometric Fingerprint": The system analyzes the ground the robot drove on. It measures three things:
1. Height: Is it too high to climb?
2. Slope: Is it too steep to slide down?
3. Roughness: Is it too bumpy to bounce over?
The "Look-Alike" Logic: The system creates a mathematical "fingerprint" of all the safe ground. Then, it looks at the rest of the world. If a patch of grass looks like the safe ground (similar height, slope, and roughness), the robot automatically labels it as "Safe." If it looks different (like a dense bush or a deep hole), it labels it "Danger."

Analogy: Imagine you are teaching a child to walk. Instead of pointing at every single safe step in the park, you let them walk on the safe path. Then, you tell them, "If the ground feels and looks like the path you just walked on, it's safe. If it feels like a swamp or a wall, avoid it." STONE does this automatically for robots.

3. Why This Matters: The "Vegetation" Trap

The paper shows a great example of why this is necessary.

The Trap: To a camera, a low patch of grass and a tall, dense bush might look the same color (green). A simple robot might think, "Green means go!" and crash into the bush.
The STONE Fix: Because STONE uses 3D geometry, it knows the low grass is flat and easy to drive over, while the bush is tall and blocking the path. It teaches the robot to understand physics, not just colors.

4. The Result: A New Benchmark

The authors didn't just collect data; they built a gymnasium for robot researchers. They created a standard test (a benchmark) where different robot brains can compete.

They tested robots using just cameras, just lasers, and the full "super-suit" (cameras + lasers + radars).
The Winner: As you might guess, the robot with the full "super-suit" (multi-modal) performed the best, proving that having all those different sensors working together is the key to surviving the wild.

Summary

STONE is a giant, open-source library that helps robots learn to drive off-road without needing humans to draw maps for them. It uses a robot's own successful drives to automatically teach it what the ground feels like, giving it a 360-degree view and the ability to see through bad weather. It's a huge step toward robots that can truly explore the wild on their own.

Here is a detailed technical summary of the paper "STONE Dataset: A Scalable Multi-Modal Surround-View 3D Traversability Dataset for Off-Road Robot Navigation."

1. Problem Statement

Reliable navigation for robots in off-road environments (e.g., agriculture, construction, military) requires accurate estimation of traversable regions. Unlike on-road driving, off-road terrains are unstructured, deformable, and lack clear boundaries (e.g., lane markings), making geometric cues (slope, roughness, elevation) critical for determining drivability.

The paper identifies three major limitations in existing off-road datasets:

Lack of Scalability: Most existing datasets rely on manual 2D semantic or traversability annotations, which are expensive, time-consuming, and difficult to scale to large environments.
Insufficient Modality: Existing datasets typically rely on front-view cameras and LiDAR, lacking surround-view coverage and 4D radar, which is crucial for robustness in adverse weather (rain, fog, dust) where optical sensors fail.
Geometric vs. Semantic Gap: Semantic labels (e.g., "grass" or "soil") do not guarantee traversability. A patch of grass can be traversable, while a dense bush of the same semantic class is not. Existing datasets often fail to provide explicit 3D geometric ground truth for traversability.

2. Methodology

A. Dataset Collection & Sensor Setup

The authors collected the STONE dataset using a tracked Unmanned Ground Vehicle (UGV) named Bunker Pro. The sensor suite is designed for 360° perception and includes:

LiDAR: 1× Hesai OT128 (128-channel, 360° horizontal FOV, 200m range).
Cameras: 6× Basler RGB cameras (1920×1200 resolution) providing a full surround-view.
Radar: 3× Continental 4D Imaging Radars (providing robustness in adverse conditions).
Localization: RTK-capable GNSS and high-rate IMU.
Synchronization: A custom trigger-based synchronization scheme aligns camera captures with LiDAR scan angles to minimize temporal and spatial misalignment.

B. Automated 3D Traversability Map Generation

To avoid manual annotation, the authors propose a fully automated, annotation-free pipeline that generates 3D traversability ground truth (GT) based on robot trajectories and geometric features. The process involves three stages:

Dense Surface Reconstruction:
- Sparse LiDAR point clouds are aggregated over time using robot odometry to create a denser point cloud.
- Poisson Surface Reconstruction is applied to convert the point cloud into a watertight 3D mesh, filling gaps and suppressing noise.
Traversability-Oriented Feature Extraction:
- For each vertex in the mesh, three geometric features are extracted:
  - Elevation ( $h$ ): Vertical height relative to the robot.
  - Slope ( $\theta$ ): Angle between the surface normal and the global vertical axis.
  - Roughness ( $r$ ): Logarithm of the Mean Squared Error (MSE) of neighboring vertices relative to a best-fit plane.
- These features are aggregated into voxel-level feature vectors.
Trajectory-Guided Auto-Labeling:
- Reference Distribution: The geometric features of voxels along the robot's actual driving trajectory are modeled as a Multivariate Gaussian distribution ( $N(\mu, \Sigma)$ ).
- Mahalanobis Distance: For every voxel in the 3D space, the Mahalanobis distance is calculated relative to the reference distribution.
- Label Assignment:
  - Traversable (T): Voxels on the logged trajectory.
  - Potentially Traversable (P): Off-trajectory voxels within a statistical confidence region (e.g., 95% confidence) of the trajectory distribution.
  - Non-Traversable (N): Voxels outside this confidence region.

3. Key Contributions

First Large-Scale 3D Traversability Dataset: STONE is the first off-road dataset providing voxel-level 3D traversability maps as ground truth, generated via a scalable, automated pipeline.
Comprehensive Multi-Modal Surround-View: It integrates synchronized 128-channel LiDAR, 6 RGB cameras, and 3 4D radars, offering full 360° coverage and robustness in adverse weather.
Diverse Environmental Coverage: The dataset covers 43 sequences across four distinct environments (farmland, mountainous terrain, lakes, construction sites) under varying conditions (day/night, sunny/rainy/shadow).
Benchmark and Baselines: The authors established a benchmark for voxel-level 3D traversability prediction, providing performance baselines for single-modal (Camera-only, LiDAR-only) and multi-modal (Camera+LiDAR, Camera+Radar) settings.

4. Results & Evaluation

Benchmark Setup: The dataset is split into Training (31 sequences), Validation (7 sequences), and Testing (5 sequences). Evaluation uses standard Intersection over Union (IoU) metrics for occupancy and per-class traversability (Traversable, Potentially Traversable, Non-Traversable).
Performance:
- Multi-modal superiority: Results in Table II show that multi-modal approaches significantly outperform single-modal baselines. For instance, the OccFusion (C+L) model achieved an mIoU of 39.6%, compared to 27.3% for Camera-only (TPVFormer) and 38.1% for LiDAR-only (L-CoNet).
- Radar Contribution: The inclusion of Radar (OccFusion C+R) improved performance over Camera-only, demonstrating the value of 4D radar in off-road perception.
Qualitative Analysis: Visualizations confirm that the automated labels correctly distinguish between traversable low vegetation and non-traversable dense bushes, even when they share the same semantic appearance. This highlights the importance of geometric features over pure semantic segmentation.

5. Significance

Scalability: The automated labeling pipeline eliminates the bottleneck of manual annotation, enabling the creation of massive, high-quality datasets for data-driven off-road navigation.
Robustness: By integrating 4D radar and surround-view sensors, STONE addresses the "blind spots" and weather sensitivity of previous datasets, paving the way for all-weather autonomous off-road robots.
Geometric Awareness: The dataset shifts the focus from semantic classification to geometric traversability, which is the fundamental requirement for safe robot navigation in unstructured environments.
Community Impact: By releasing the dataset and baselines, the authors provide a standardized platform for comparing algorithms, accelerating research in 3D traversability prediction and autonomous off-road navigation.

Availability: The dataset and code are publicly available at https://konyul.github.io/STONE-dataset/.

STONE Dataset: A Scalable Multi-Modal Surround-View 3D Traversability Dataset for Off-Road Robot Navigation

1. The Problem: The "Blind" Robot

2. The Magic Trick: No Human Labeling Needed

3. Why This Matters: The "Vegetation" Trap

4. The Result: A New Benchmark

Summary

1. Problem Statement

2. Methodology

A. Dataset Collection & Sensor Setup

B. Automated 3D Traversability Map Generation

3. Key Contributions

4. Results & Evaluation

5. Significance

More like this

Monotone Comparative Statics without Lattices

Motion Illusions Generated Using Predictive Neural Networks Also Fool Humans

Performance Analysis of IEEE 802.11p Preamble Insertion in C-V2X Sidelink Signals for Co-Channel Coexistence

Construction of time-varying ISS-Lyapunov Functions for Impulsive Systems

Real-Time BDI Agents: a model and its implementation