Multispectral airborne laser scanning for tree species classification: a benchmark of machine learning and deep learning algorithms

Josef Taher, Eric Hyyppä, Matti Hyyppä, Klaara Salolahti, Xiaowei Yu, Leena Matikainen, Antero Kukko, Matti Lehtomäki, Harri Kaartinen, Sopitta Thurachen, Paula Litkey, Ville Luoma, Markus Holopainen, Gefei Kong, Hongchao Fan, Petri Rönnholm, Matti Vaaja, Antti Polvivaara, Samuli Junttila, Mikko Vastaranta, Stefano Puliti, Rasmus Astrup, Joel Kostensalo, Mari Myllymäki, Maksymilian Kulicki, Krzysztof Stereńczak, Raul de Paula Pires, Ruben Valbuena, Juan Pedro Carbonell-Rivera, Jesús Torralba, Yi-Chen Chen, Lukas Winiwarter, Markus Hollaus, Gottfried Mandlburger, Narges Takhtkeshha, Fabio Remondino, Maciej Lisiewicz, Bartłomiej Kraszewski, Xinlian Liang, Jianchang Chen, Eero Ahokas, Kirsi Karila, Eugeniu Vezeteu, Petri Manninen, Roope Näsi, Heikki Hyyti, Siiri Pyykkönen, Peilun Hu, Juha Hyyppä

Published 2026-02-18

📖 5 min read🧠 Deep dive

View on arXiv ↗PDF ↗

Imagine you are trying to identify every single person in a massive, crowded stadium just by looking at a blurry, black-and-white photo taken from a helicopter. That is essentially the challenge forest managers face when trying to count and identify different types of trees from the air.

This paper is a grand experiment (a "benchmark") to see which computer brain is best at solving this puzzle. The researchers set up a competition between old-school computer programs and brand-new, super-smart "Deep Learning" AI to see who can best tell a Pine tree from an Oak tree using laser scans.

Here is the breakdown of their adventure:

1. The Tools: Two Different "Flashlights"

The team used two different laser scanners (LiDAR) to take pictures of a forest near Helsinki, Finland. Think of these scanners as high-tech flashlights that bounce light off trees to create a 3D map.

The "Old" Flashlight (Optech Titan): This was like a standard flashlight. It was fast and covered a wide area, but the image was a bit "grainy" (low resolution). It gave about 35 dots (points) for every square meter of forest.
The "Super" Flashlight (HeliALS): This was a custom-built, high-tech flashlight. It flew lower and used three different colors of light (like a camera with Red, Green, and Blue filters, but with lasers). It created a super-sharp, "4K" image with over 1,000 dots per square meter.

2. The Contestants: The Brains

They invited 13 teams of scientists to build computer models to identify 9 different tree species (like Pine, Spruce, Birch, Aspen, etc.). The contestants fell into three camps:

The "Hand-Crafters" (Machine Learning): These are like experienced detectives who look at a list of clues (e.g., "Is the tree tall? Is it pointy?") and make a decision based on rules they were taught.
The "2D Artists" (Image-based Deep Learning): These models took the 3D tree, flattened it into 2D pictures from different angles (like taking photos of a statue from the front, side, and top), and fed them into a standard image-recognition AI (like the one that recognizes cats in your phone).
The "3D Visionaries" (Point-based Deep Learning): These models looked at the raw 3D cloud of dots directly. They didn't flatten the tree; they understood the tree's shape in 3D space, like a sculptor looking at a block of marble.

3. The Results: Who Won?

On the "Grainy" Photos (Low Density):
The Hand-Crafters (Machine Learning) won. When the data was sparse and blurry, the simple, rule-based detectives were actually better. They didn't get confused by the lack of detail. The "3D Visionaries" struggled a bit because they needed more data to learn the rules.

On the "4K" Photos (High Density):
The 3D Visionaries (Deep Learning) crushed the competition. When the data was rich and detailed, the AI that could "see" in 3D was unbeatable.

The Champion: A model called the Point Transformer. It achieved an accuracy of 87.9%.
The Runner-up: The Hand-Crafters (Random Forest) got about 83.2%.
The Loser: The 2D Artists (Image-based) got about 84.3%.

The Secret Weapon: Color
The study found that using the three different laser colors (multispectral) was like giving the AI a pair of sunglasses that let it see invisible colors.

Without color info, the AI was like a blind person guessing the tree type by touch alone.
With color info, the AI could see that different trees reflect light differently, just like how a red apple looks different from a green one. This boosted accuracy significantly, especially for rare trees.

4. The "Learning Curve" Analogy

One of the most interesting findings was about how much data the AI needed to learn.

The Hand-Crafter: Imagine a student who learns a few rules and is good immediately. But if you give them 1,000 more textbooks, they don't get much better. They hit a "ceiling."
The Deep Learning AI: Imagine a student who knows nothing at first. But if you give them 100 books, they get okay. If you give them 1,000 books, they get great. If you give them 10,000 books, they become a genius.
The Finding: Deep learning models improve much faster as you feed them more data. The researchers calculated that to reach a near-perfect score (90% accuracy), the Deep Learning AI would need about 14,000 trees to study, while the Hand-Crafter would need millions to reach the same level.

5. Why Does This Matter?

Why do we care if a computer can tell an Aspen from a Birch?

Biodiversity: Some trees (like Aspen) are "superheroes" for nature. They host many insects and birds. If we don't know where they are, we can't protect them.
Climate Change: Different trees store carbon differently. To fight climate change, we need to know exactly what we have.
City Planning: Cities need to know where their trees are to manage shade, air quality, and safety.

The Takeaway

This paper is a victory for Deep Learning, but with a catch: You need good data.
If you have a cheap, low-resolution scan, a simple computer program works fine. But if you want to build a "digital twin" of a forest to manage it perfectly, you need high-resolution, multi-colored laser scans and a powerful 3D AI to interpret them.

The researchers also built a crowdsourcing app (like a game where people walk in the forest and tag trees on their phones) to gather the massive amount of "ground truth" data needed to train these super-AIs. It's a bit like training a dog: you can't just tell it what a "tree" is; you have to show it thousands of examples until it figures it out. This study proved that with enough examples and the right "flashlight," computers can finally learn to see the forest for the trees.

1. Problem Statement

Accurate tree species classification at the individual tree level is critical for climate-smart forestry, biodiversity preservation, and sustainable wood supply chain management. While Airborne Laser Scanning (ALS) is the standard for forest inventory, accurate species identification remains challenging, particularly for:

Rare or under-represented species (e.g., aspen, oak, alder) in class-imbalanced datasets.
Species with similar structural and spectral characteristics.
Sparse point clouds where geometric features alone are insufficient.
The lack of comprehensive benchmarks comparing traditional machine learning (ML) against deep learning (DL) methods on high-density multispectral data.

2. Methodology

2.1 Data Acquisition and Datasets

The study utilized two distinct multispectral ALS datasets from a peri-urban study area in Espoonlahti, Finland, covering 9 tree species (Pine, Spruce, Birch, Maple, Aspen, Rowan, Oak, Linden, Alder):

High-Density Dataset (HeliALS): Acquired using a custom FGI-developed system with three scanners (1550 nm, 905 nm, 532 nm) at a flight altitude of 100m. Point density: >1,000 pts/m² (specifically ~1,300 pts/m²).
Sparse Dataset (Optech Titan): Acquired using a commercial Optech Titan system at 700m altitude. Point density: ~35 pts/m².

2.2 Reference Data Collection

A high-quality ground-truth dataset was established using a novel browser-based crowdsourcing tool.

Total Segments: 6,326 tree segments.
Process: Field surveyors annotated segments using GNSS and orthophotos. An expert verification phase corrected initial crowd classifications, particularly for minority species.
Split: 1,065 segments for training and 5,261 for testing (benchmark contest). A larger "scaling analysis" dataset of 6,257 segments was used for error scaling studies.

2.3 Benchmarking Framework

The study organized an international benchmark involving 13 teams submitting 26 methods:

Point-based (3D) Deep Learning: Point Transformer, PointNet, PointNet++, DGCNN, Point2Vec.
Image-based (2D) Deep Learning: YOLOv8, DenseNet (DetailView), ConvNeXt.
Traditional Machine Learning: Random Forest (RF), Support Vector Machine (SVM), Gradient Boosting (GB).

2.4 Evaluation Metrics

Performance was evaluated using:

Overall Accuracy (OA): Proportion of correct classifications.
Macro-Average Accuracy: Unweighted mean of per-class recall (crucial for imbalanced datasets).
Scaling Laws: Analysis of classification error ( $\epsilon$ ) as a function of training set size ( $m$ ) and point density ( $\sigma$ ), modeled as $\epsilon(m) \propto m^{-\alpha}$ .

3. Key Contributions

First High-Density Multispectral Benchmark: This is the first study to benchmark DL and ML methods on high-density (>1,000 pts/m²) multispectral ALS data, comparing it against sparse data.
Open Dataset Release: The authors released the Espoonlahti dataset (6,326 segments with ground truth) to the scientific community to foster reproducibility and further research.
Crowdsourcing Tool: Development and validation of a browser-based tool for efficient, large-scale field reference data collection with expert verification.
Scaling Law Analysis: Quantitative demonstration that deep learning models converge significantly faster than traditional ML models as training data increases, following a power law.

4. Key Results

4.1 Algorithm Performance

High-Density (HeliALS) Data:
- Best Performer: Point Transformer (3D DL) achieved the highest Overall Accuracy (87.9%) and Macro-Average Accuracy (74.5%).
- Comparison: Point-based DL significantly outperformed Image-based DL (DetailView: 84.3% OA) and traditional ML (Random Forest: 83.2% OA).
- Minority Species: The Point Transformer with a weighted loss function showed superior performance in identifying rare species (e.g., Aspen, Oak) compared to RF.
Sparse (Optech Titan) Data:
- Best Performer: Random Forest (ML) achieved the highest Overall Accuracy (79.9%), slightly outperforming the Point Transformer (79.6%).
- Observation: On sparse data, traditional ML is robust and competitive, while DL methods struggle to leverage their full potential without sufficient data density.

4.2 Impact of Spectral Information

Multispectral Advantage: Adding spectral information significantly boosted accuracy.
- Geometry only: 73.0% OA.
- Single-channel reflectance: 84.7% OA.
- Full multispectral (3 channels): 87.9% OA.
Density Interaction: The benefit of multispectral data is most pronounced at low point densities (1–50 pts/m²). At high densities, structural information partially compensates for the lack of spectral data, though multispectral data still provides the best results.

4.3 Scaling Laws and Training Data

Power Law Convergence: Classification error follows $\epsilon(m) = Am^{-\alpha}$ .
Convergence Rate: The Deep Learning model (Point Transformer) has a scaling exponent ( $\alpha \approx 0.256$ ) roughly twice as high as the Random Forest model ( $\alpha \approx 0.14$ ).
Implication: DL models improve much faster with more data.
- To reach 90% macro-average accuracy, the DL model requires ~14,000 training segments.
- The ML model would require ~4.9 million segments to reach the same level, highlighting the scalability advantage of DL for large-scale forestry.

4.4 Segmentation and Crown Class

Profile Categories: Accuracy drops significantly for "Many trees of many species" (mixed crowns) and "Smaller tree next to larger tree" (suppressed trees).
DL vs. ML: On high-density data, DL methods handle complex segmentation scenarios (mixed species) better than ML.

5. Significance and Conclusion

Operational Viability: The study demonstrates that high-density multispectral ALS combined with Point Transformer models enables operational-level mapping of individual tree species, including rare biodiversity keystone species like Aspen, with high precision (F1 > 0.70).
Data Requirements: While traditional ML is sufficient for sparse, national-scale mapping (e.g., 35 pts/m²), Deep Learning is essential for high-precision, biodiversity-focused applications requiring dense data and large training sets.
Future Directions: The research underscores the need for larger, diverse training datasets to unlock the full potential of DL. It also highlights the importance of consistent segmentation methods between training and inference to avoid distribution shifts.
Resource Sharing: By releasing the dataset and tools, the authors provide a foundational resource for the remote sensing community to advance automated forest monitoring and climate-smart forestry.

In summary, this paper establishes that multispectral information is critical for sparse data, while high-density point clouds combined with deep learning (specifically Point Transformers) offer superior accuracy for complex, species-rich environments, provided sufficient training data is available.