Go-Browse: Training Web Agents with Structured Exploration

Imagine you are trying to teach a robot how to use the internet. You want it to be able to book flights, buy groceries, or check its bank account just by looking at a screen and clicking buttons.

The problem is, most robots (AI agents) are terrible at this. They get lost easily. If you tell a robot to "buy a specific shirt on a website," it might wander into the wrong section, click the wrong button, or get stuck on a page it doesn't understand. It's like giving a tourist a map to a city they've never visited, but the map is missing all the street names, and the tourist has no idea how to ask for directions.

This paper introduces a new method called Go-Browse to fix this. Here is how it works, explained simply:

The Problem: The "Lost Tourist"

Current AI agents are like tourists who try to learn a city by wandering around aimlessly. They might stumble upon a cool shop by accident, but they rarely learn the best way to get there. If they get lost, they just start over from the beginning, wasting time and energy.

The Solution: The "Smart Tour Guide" (Go-Browse)

Instead of letting the robot wander blindly, the authors created a system that acts like a super-organized tour guide. This guide doesn't just wander; it builds a mental map of the city as it goes.

Here is the step-by-step process, using a Library as an analogy:

1. The Outer Loop: Mapping the Shelves

Imagine a library with thousands of books. A normal robot might walk in, pick a random book, read it, and then walk out. Next time, it picks another random book. It never learns where the "Cooking" section is relative to the "History" section.

Go-Browse is different. It starts at the front desk (the homepage). It says, "Okay, I'm here. What other sections can I reach from here?" It finds the "Cooking" aisle, then the "History" aisle, and so on. It builds a map of the library, keeping track of every shelf it has found but hasn't fully explored yet.

2. The Inner Loop: The "Reset and Explore" Trick

This is the secret sauce. In the past, if a robot got lost trying to find a specific book deep in the library, it would have to walk all the way from the front door again to try a different path. That's exhausting and inefficient.

Go-Browse uses a "Time Travel" trick.

Once the robot discovers a new section (like the "Cooking" aisle), it saves that spot.
If it needs to learn how to find a specific recipe inside that aisle, it doesn't start from the front door. It teleports (resets) directly to the "Cooking" aisle.
Now, it only has to focus on the small task of finding the recipe, not the hard task of navigating the whole building.

This allows the robot to practice specific skills (like "clicking the 'Buy' button") without getting tired from the long walk to get there.

3. The "Feasibility Check": The Safety Net

Before the robot tries to learn a new task, a "Safety Officer" (a very smart AI) checks: "Is this actually possible?"

If the robot tries to learn how to "Buy a unicorn," the Safety Officer says, "Nope, unicorns aren't real here. Don't waste time."
If the task is real (like "Buy a toaster"), the Safety Officer lets the robot try. If it succeeds, that success is saved as a lesson. If it fails, the lesson is discarded.

The Result: A Super-Learner

The authors used this method to collect 10,000 successful examples of robots doing web tasks. They then taught a standard AI model (a 7-billion parameter model) using these examples.

The outcome was impressive:

The trained robot became significantly better at navigating websites than before.
It actually beat GPT-4o Mini (a very powerful, expensive AI from OpenAI) on these specific tasks.
It was much better than other robots that tried to learn by just wandering around without a map.

Why This Matters

Think of it like learning to drive.

Old Way: You get in a car and drive around a city randomly, hoping to learn how to park. You crash a lot, and you never learn the specific rules of the parking lot.
Go-Browse Way: You first learn the layout of the city (the map). Then, you practice parking in a specific spot by starting right in front of that spot, over and over, until you master it. Once you master that spot, you move to the next one.

By breaking the big, scary problem of "surfing the web" into small, manageable chunks and remembering where you've been, Go-Browse teaches AI agents to be much more confident, efficient, and successful digital helpers.

Here is a detailed technical summary of the paper "GO-BROWSE: TRAINING WEB AGENTS WITH STRUCTURED EXPLORATION".

1. Problem Statement

Digital agents, particularly those based on Large Language Models (LLMs), struggle with web browsing tasks due to a lack of prior understanding of specific web environments. While frontier models (e.g., GPT-4o) achieve high success rates in general domains, their performance on GUI-based web agent benchmarks like WebArena is significantly lower (e.g., GPT-4o-mini scores ~19% vs. human ~78%).

Existing methods for collecting training data face two main limitations:

Human-generated data: High quality but prohibitively expensive and time-consuming to scale.
Unsupervised automatic methods:
- Interaction-first: Agents explore without specific goals, leading to redundant exploration and shallow coverage.
- Instruction-first: Agents propose tasks based on static observations, often hallucinating infeasible tasks or failing to explore deep parts of the website because they lack context from dynamic exploration.

The core challenge is efficiently collecting diverse, realistic, and high-quality web agent trajectories at scale without human intervention, while ensuring the agent learns both navigation (finding the right page) and local task solving (performing actions on that page).

2. Methodology: GO-BROWSE

The authors propose GO-BROWSE, a fully unsupervised method that frames data collection as a graph search problem. It systematically explores web environments to build a map of discovered URLs and collects diverse task trajectories.

Core Architecture

GO-BROWSE operates using an Outer Loop (Global Coverage) and an Inner Loop (Local Exploration):

Graph Construction: The system maintains a graph $G=(V, E)$ where nodes $V$ are unique URLs and edges $E$ are trajectories connecting them. It tracks an Exploration Frontier of discovered but not fully explored pages.
Outer Loop (Global Coverage):
- Selects a webpage $v$ from the Frontier.
- Reset Strategy: Instead of always starting from the homepage, the agent resets to the selected page $v$ . This decouples the difficult problem of navigation (finding the page) from task execution (solving the task on the page).
- This strategy is inspired by Go-Explore (Ecoffet et al.), which uses "reset-then-explore" to solve hard-exploration problems in games.
Inner Loop (Local Exploration): For the selected page $v$ $v$ , the system executes three modules:
- NavExplorer: An agent tasked with finding neighboring pages. It interacts with the current page to discover new URLs and proposes navigational tasks to reach them. This expands the Frontier.
- PageExplorer: An agent tasked with proposing local tasks feasible on the current page (e.g., "filter products by price," "edit profile").
- FeasibilityChecker: Filters proposed tasks. It uses a strong LLM (e.g., Claude-3.7-Sonnet) to attempt solving the task and a Vision-Language Model (VLM) as a judge to verify success. Only feasible tasks are kept.
- Solvers: Samples trajectories for feasible tasks using two strategies:
  - Prefixed Sampling: The agent starts from the current page $v$ (solving the task locally). This is easier and allows weaker models to generate high-quality data.
  - Unprefixed Sampling: The agent starts from the root (homepage) and must navigate to $v$ to solve the task. This trains long-horizon navigation skills.

Key Innovation

The method decouples navigation from task solving. By resetting to previously discovered "promising" pages, GO-BROWSE ensures that deep, hard-to-reach pages are thoroughly explored once found, rather than being abandoned if the agent fails to navigate to them from the root in a single episode. This creates a bootstrapping effect where weaker models can generate high-quality data for specific pages, which is then used to train stronger navigation capabilities.

3. Key Contributions

GO-BROWSE Algorithm: A novel, unsupervised framework for web data collection that treats exploration as a graph traversal problem, enabling information reuse across episodes.
GO-BROWSE-WA Dataset: A large-scale dataset collected on the WebArena benchmark containing:
- ~10,000 successful task-solving trajectories.
- ~17,000 unsuccessful trajectories.
- ~40,000 interaction steps across 100 distinct URLs (20 per domain across 5 domains: Shopping Admin, Shopping, Reddit, GitLab, Map).
State-of-the-Art Performance: Demonstrated that fine-tuning a 7B parameter model on this dataset significantly outperforms previous baselines and even larger closed-source models.
Analysis of Exploration: Showed that structured exploration leads to more diverse task distributions and deeper website coverage compared to independent exploration methods (like NNetNav).

4. Experimental Results

The authors fine-tuned Qwen-2.5-7B-Instruct on the GO-BROWSE-WA dataset (using only successful trajectories).

WebArena Benchmark Performance:
- GO-BROWSE-7B: Achieved a 21.7% overall success rate.
- Comparison:
  - Outperformed GPT-4o-mini (19.3%) by 2.4%.
  - Outperformed the previous SOTA for sub-10B models (NNetNav-7B at 18.8%) by 2.9%.
  - Outperformed the base Qwen-2.5-7B-Instruct (8.3%) by 13.4%.
- Domain Specifics: The model showed significant gains in Shopping Admin (+11% over NNetNav) and Reddit (+7% over NNetNav).
Out-of-Distribution (OOD) Generalization:
- Evaluated on Online-Mind2Web (OM2W).
- GO-BROWSE-7B maintained a lead over NNetNav-7B, particularly on "In-Domain-Adjacent" websites (similar to WebArena), where it approached GPT-4o-mini performance.
Efficiency: The FeasibilityChecker reduced the number of trajectory rollouts by ~13% while maintaining data quality. The total cost to collect the dataset was approximately $975.

5. Significance and Impact

Bridging the Gap: GO-BROWSE demonstrates that structured, unsupervised exploration can generate data sufficient to train small, open-weight models (7B) to outperform much larger, proprietary models (GPT-4o-mini) on complex web tasks.
Efficiency in Data Collection: By reusing information (resetting to known pages) and decoupling navigation from execution, the method solves the "exploration efficiency" problem that plagues previous unsupervised approaches.
Scalability: The approach is fully automated and scalable to any website, offering a viable path to generating massive, high-quality datasets for digital agents without human labeling.
Future Directions: The authors suggest that incorporating failure signals (the 17k unsuccessful trajectories) and scaling model sizes could further improve performance, though current results already set a new benchmark for efficient web agents.

In conclusion, GO-BROWSE represents a significant step forward in autonomous agent research by proving that structured exploration is a more effective strategy for data collection than random or purely instruction-based approaches, enabling smaller models to achieve state-of-the-art web navigation capabilities.