SG-DOR: Learning Scene Graphs with Direction-Conditioned Occlusion Reasoning for Pepper Plants

This paper introduces SG-DOR, a relational framework that utilizes a direction-aware graph neural network to infer scene graphs encoding physical attachments and direction-conditioned occlusion for robotic harvesting of pepper plants in dense canopies.

Rohit Menon, Niklas Mueller-Goldingen, Sicong Pan, Gokul Krishna Chenchani, Maren Bennewitz

Published 2026-03-09
📖 4 min read☕ Coffee break read

Imagine you are a robot chef trying to pick a ripe pepper from a bush. The problem? The bush is a tangled mess. The pepper you want is hiding behind a wall of leaves, and you can't see the stem holding it. If you just reach in blindly, you might snap the stem, crush the fruit, or get your arm stuck in a thicket of leaves.

To solve this, you don't just need to see the pepper; you need to understand the relationships between the pepper, the leaves, and the stem. You need to know: "Which specific leaf is blocking my view? If I push that one aside, will the pepper be free?"

This paper introduces SG-DOR, a smart AI system designed to be that "brain" for agricultural robots. Here is how it works, explained simply:

1. The Problem: The "Blind Reach"

In a dense pepper plant, fruits are often hidden. Current robots are like people trying to find a specific book in a messy room by just looking at the top shelf. They can see the fruit, but they don't know what is hiding it or which way to move to see it better. They lack a mental map of "who is blocking whom."

2. The Solution: A "Social Network" for Plants

The authors created a system called SG-DOR (Scene Graphs with Direction-Conditioned Occlusion Reasoning). Think of this as building a social network profile for every single part of the plant.

  • The Nodes (The People): Every leaf, stem, and pepper is a "person" in this network.
  • The Edges (The Relationships): The system draws lines connecting them. It learns that "Leaf A is attached to Stem B" and "Leaf C is standing in front of Pepper D."
  • The Secret Sauce (Direction): This is the magic part. The system doesn't just ask, "Is the pepper hidden?" It asks, "Is the pepper hidden if I look from the top? What about if I look from the side?"

3. How It Thinks: The "Crowded Room" Analogy

Imagine you are in a crowded room trying to talk to a friend (the pepper).

  • Old Robots: They see your friend but don't know who is standing in the way. They might try to push through the whole crowd.
  • SG-DOR: It acts like a super-observant host. It looks at the crowd and says:
    • "Okay, if you approach from the North, Leaf 1 is the main blocker. If you push Leaf 1, you're good."
    • "But if you approach from the East, Leaf 2 is the one blocking you."
    • It even ranks them: "Leaf 1 is the biggest problem, Leaf 2 is a minor problem."

4. How It Learned: The "Video Game" Training

You can't easily teach a robot this in a real greenhouse because it's too messy and hard to see the "truth" (you can't see the hidden parts of the plant).

  • The Simulation: The researchers built a massive, perfect video game world of pepper plants. In this game, they knew exactly where every leaf was and exactly how much it blocked the pepper from every angle.
  • The Training: They fed this game data to the AI. The AI played millions of rounds, learning to predict: "If I see this shape, and I'm looking from this angle, this specific leaf is the one hiding the fruit."

5. The Result: A "To-Do List" for Robots

When the robot looks at a real pepper plant, SG-DOR doesn't just give a picture. It gives a strategic plan:

  1. Identify: "Here is the target pepper."
  2. Analyze: "From your current angle, these three leaves are blocking it."
  3. Rank: "Leaf #1 is the biggest blocker. Leaf #2 is next. Leaf #3 is barely in the way."
  4. Act: "Robot, please gently push Leaf #1 aside first. Then you can grab the pepper."

Why This Matters

This isn't just about picking peppers; it's about precision. Instead of a robot blindly hacking away at a plant (which damages the crop), it acts like a skilled gardener who knows exactly which branch to move to reveal the fruit.

The paper proves that by teaching robots to understand direction and relationships (not just shapes), we can make them much better at harvesting crops in messy, real-world environments. It turns a chaotic bush into a structured map that a robot can actually navigate.