Progressive Per-Branch Depth Optimization for DEFOM-Stereo and SAM3 Joint Analysis in UAV Forestry Applications

This paper presents a progressive per-branch depth optimization pipeline that integrates DEFOM-Stereo disparity estimation, SAM3 segmentation, and a multi-stage filtering scheme to transform noisy stereo imagery into geometrically coherent 3D point clouds, reducing per-branch depth standard deviation by 82% for autonomous UAV-based tree pruning in forestry applications.

Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

Published 2026-02-25
📖 5 min read🧠 Deep dive

Imagine you are a tiny robot drone trying to fly through a dense forest and cut specific tree branches without hurting the tree or crashing. To do this safely, the drone needs to know exactly where every single branch is in 3D space, down to the centimeter.

The problem? Trees are messy. Leaves overlap, branches tangle, and the sky looks very different from the wood. When the drone takes a picture with two eyes (stereo vision) to guess depth, the result is often a "noisy" mess—like a photo full of static and blurry spots.

This paper describes a six-step "cleaning factory" that turns a blurry, messy 3D map of a tree into a crystal-clear, precise blueprint of individual branches. The authors call this a "Progressive Pipeline," meaning they didn't try to fix everything at once. Instead, they built a basic version, saw what broke, fixed it, and then fixed the new problems that appeared.

Here is how their "factory" works, using simple analogies:

The Ingredients

  1. The Eyes (DEFOM-Stereo): A super-smart AI camera that looks at two photos and guesses how far away everything is. It's great at the big picture but makes small, noisy mistakes.
  2. The Tracer (SAM3): Another AI that draws outlines around objects. It tries to draw a line around a specific branch, but sometimes its line is a little too loose, letting in background sky or neighboring leaves.

The Six-Step Cleaning Process

Version 1: The "Naive" Attempt

  • The Analogy: You ask a child to trace a branch on a map and then color it in. The child draws the line, but it's a bit wobbly, and they accidentally color in some of the sky behind the branch.
  • The Result: The 3D map is full of "ghost" points from the sky and the tree looks like a fuzzy cloud.

Version 2: The "Shrink Wrap" (Morphological Erosion)

  • The Fix: To stop the sky from getting in, the team shrinks the traced outline inward, like pulling a rubber band tight around the branch.
  • The Problem: This works great for thick tree trunks, but for thin twigs, the rubber band pulls the whole thing apart. The thin branches disappear!

Version 3: The "Skeleton Saver" (Skeleton-Preserving Erosion)

  • The Fix: Instead of just shrinking the outline, they find the "spine" or skeleton of the branch first (like finding the backbone of a fish). They keep that spine safe and only trim the fat around it.
  • The Result: The thick branches get trimmed of the sky, but the thin twigs stay connected and intact.

Version 4: The "Color Detective" (Color Validation)

  • The Problem: Even with a perfect outline, the AI might have grabbed a green leaf that is stuck to the branch. The outline is right, but the color is wrong.
  • The Fix: They set up a "color police." They take a sample of the branch's true color (from the very center of the branch) and check every pixel inside the outline. If a pixel is too different in color (like a leaf or a shadow), they kick it out. They also make sure two branches don't claim the same pixel.
  • The Result: Now, the mask contains only the branch, no extra leaves or sky.

Version 5: The "Statistical Sweeper" (IQR & Z-Score)

  • The Problem: The outline is perfect, but the depth numbers inside are still jittery. Some points say the branch is 1 meter away, and a neighbor says 1.5 meters. It's noisy.
  • The Fix: They use standard math rules (like finding the average and throwing out the weird outliers) to smooth things out.
  • The Problem: This is a bit blunt. It smooths out the noise, but it also blurs the sharp edges of the branch, making it look like a soft, fuzzy tube.

Version 6: The "Master Sculptor" (The Final Version)

  • The Fix: This is the big upgrade. They replace the blunt math with a smarter, five-stage process:
    1. MAD (Median Absolute Deviation): A tougher math rule that ignores extreme outliers better than the old method.
    2. Neighbor Voting: If one point looks weird compared to its neighbors, it gets voted out.
    3. Guided Filtering: This is the magic trick. They look at the color photo again. If the color photo shows a sharp edge (where the branch ends and the sky begins), the depth filter respects that edge and doesn't blur it. It's like using a ruler that only smooths flat surfaces but leaves sharp corners alone.
    4. Adaptive Smoothing: They adjust the "smoothness" based on the branch. Thick trunks get smoothed more; thin twigs get smoothed less so they don't vanish.
  • The Result: The final 3D point cloud is incredibly sharp. The branch looks solid, the edges are crisp, and the noise is gone.

Why Does This Matter?

The authors tested this on Radiata Pine trees in New Zealand.

  • Before: The 3D data was so noisy (standard deviation of 440mm) that a robot couldn't tell where to cut.
  • After: The data became incredibly precise (standard deviation of 31.5mm), a 82% improvement.

The Bottom Line:
This paper isn't just about making pretty pictures. It's about giving a robot the "eyes" and "brain" it needs to safely prune trees without a human holding a chainsaw. By breaking the problem down into six small, logical steps—fixing the outline, checking the color, and then polishing the depth—they turned a messy forest into a clean, digital blueprint ready for autonomous robots.

They even released all their code and data for free, so other scientists can use this "cleaning factory" to help robots work in forests everywhere.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →