WildOS: Open-Vocabulary Object Search in the Wild

WildOS is a unified system that enables robust, long-range autonomous navigation in unstructured environments by integrating a sparse geometric navigation graph with a foundation-model-based vision module (ExploRFM) and a particle-filter localization method to perform efficient, open-vocabulary object search through semantically informed and geometrically safe exploration.

Hardik Shah, Erica Tevere, Deegan Atha, Marcel Kaufmann, Shehryar Khattak, Manthan Patel, Marco Hutter, Jonas Frey, Patrick Spieler

Published 2026-02-24
📖 5 min read🧠 Deep dive

Imagine you are sending a robot on a mission to find a specific object, like a "red fire hydrant" or a "blue house," in a massive, wild forest or a messy city street. You don't have a map, and the robot can only see a few meters ahead with its "eyes" (sensors). Everything beyond that is a foggy mystery.

This is the problem WildOS solves. It's a new system that teaches robots how to explore the wild world not just by feeling their way around obstacles, but by thinking about what they see, much like a human would.

Here is how it works, broken down into simple concepts and analogies:

1. The Problem: The "Tunnel Vision" Robot

Most robots today are like people wearing thick foggy glasses. They can see the ground right in front of them perfectly (geometric sensing), but once they look past a few meters, they are blind.

  • The Old Way: If a robot sees a fence blocking the direct path to a goal, it just turns around and tries to go around the fence blindly. It doesn't know if there's a nice open gate 50 meters away that it can't see yet. It's "myopic" (short-sighted).
  • The Vision Problem: Some robots try to use cameras to see far away, but they have no memory. They might see a path, take a step, forget they saw it, and then walk in circles, going back and forth over the same ground.

2. The Solution: WildOS (The "Smart Explorer")

WildOS combines two superpowers: Geometric Memory (a map of where it's been) and Visual Reasoning (a brain that understands images).

Think of WildOS as a hiker with two tools:

  1. A Sketchbook (The Navigation Graph): Instead of drawing a detailed, heavy map of every tree and rock (which takes too much memory), the robot draws a simple connect-the-dots map. It marks safe spots and the edges of what it knows. This is light, fast, and remembers where it has already been so it doesn't get lost.
  2. A Crystal Ball (ExploRFM): This is the robot's "brain" based on advanced AI. It looks at the camera image and predicts three things far beyond what the robot can physically touch:
    • Is it safe to walk there? (e.g., "That looks like water, not grass.")
    • Is there a path ahead? (e.g., "I see a gap between those trees.")
    • Is that the object I'm looking for? (e.g., "That blurry shape in the distance looks like a house.")

3. How They Work Together: The "Scored Map"

The magic happens when the robot combines its sketchbook with its crystal ball.

  • The Scenario: The robot is at a fork in the road. One path goes straight toward the goal but hits a wall. The other path curves away but looks like it leads through a beautiful, open meadow.
  • The Decision:
    • A dumb robot would just go straight because the goal is in that direction.
    • A WildOS robot looks at the "meadow" path. Its AI says, "That path is safe, it's open, and it looks promising." It gives that path a high score.
    • It then updates its sketchbook, marking that path as the best way to go, even though it's not the straight line to the goal.

4. Finding the "Invisible" Target

What if the target (like a "NASA sign") is 200 meters away, far beyond the robot's sensors?

  • The Trick: The robot takes a picture, sees the sign, and then uses a technique called Triangulation. Imagine holding your thumb up and closing one eye, then the other; your thumb seems to jump. The robot does this with its cameras from different spots.
  • Even though it can't measure the exact distance with lasers (because it's too far), it uses math to guess, "Okay, based on where I saw it from here and there, the sign is probably over there." It creates a "probable location" and starts walking toward that guess.

5. Real-World Results

The researchers tested this on a real robot (a Boston Dynamics Spot dog) in messy off-road areas and cities.

  • The Test: They asked the robot to find things like a "garbage can," a "golf cart," or a "NASA logo."
  • The Result: WildOS was much faster and smarter than robots that only used maps or only used cameras.
    • When it hit a dead end, it remembered it had been there and turned back to try a different path (unlike the camera-only robot, which kept walking in circles).
    • It found shortcuts through gaps in fences that other robots missed because they were too focused on the straight line to the goal.

The Big Picture

WildOS is like giving a robot a human-like intuition. It doesn't just react to what's touching its feet; it looks ahead, understands the scene, remembers where it's been, and makes smart guesses about where to go next. It's a giant step toward robots that can truly explore the wild world on their own, finding lost items or inspecting dangerous areas without needing a human to hold their hand.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →