CMU Researchers Introduce Go-Browse: A Graph-Based Framework for Scalable Web Agent Training

Why Web Agents Struggle with Dynamic Web Interfaces Digital agents designed for web environments aim to automate tasks such as navigating pages, clicking buttons, or submitting forms. These agents operate by interpreting browser data and simulating user interactions to complete specified tasks. Success in this domain requires an accurate understanding of dynamic content and the […] The post CMU Researchers Introduce Go-Browse: A Graph-Based Framework for Scalable Web Agent Training appeared first on MarkTechPost.

Jun 24, 2025 - 10:00

CMU Researchers Introduce Go-Browse: A Graph-Based Framework for Scalable Web Agent Training

Why Web Agents Struggle with Dynamic Web Interfaces

Digital agents designed for web environments aim to automate tasks such as navigating pages, clicking buttons, or submitting forms. These agents operate by interpreting browser data and simulating user interactions to complete specified tasks. Success in this domain requires an accurate understanding of dynamic content and the ability to provide adaptable responses, as web interfaces vary widely and continually evolve. While pretrained language models have shown prowess in other areas, their performance in GUI-based web tasks remains limited, primarily due to the complexities and variability of web pages.

Challenges of Data Collection for Web Agents at Scale

One significant challenge arises from the agents’ limited understanding of the environments in which they are expected to work. Pretrained models often falter when interacting with unfamiliar or complex interfaces. Unlike static datasets, real-world web environments demand continuous decision-making in response to layout differences and shifting user flows. This makes it difficult for digital agents to reliably accomplish tasks such as finding a specific product or completing an online form. Human-curated data could offer guidance, but collecting this data is labor-intensive and cannot scale to meet the diversity of real-world web scenarios.

Review of Past Approaches: Interaction-First vs. Instruction-First Methods

Researchers have previously attempted various methods to collect data to train these agents. One approach—called interaction-first—lets an agent explore websites based on broad instructions and later labels their activities using another model. While this may lead to deeper exploration, it often results in redundant behavior across sessions, limiting data diversity. Another method, instruction-first, generates specific tasks for an agent to perform based on the content of a single web page. Although more focused, these tasks are frequently anchored to only the visible content and might not be feasible, especially when based on hallucinated elements.

Introducing Go-Browse: Structured Graph-Based Web Exploration

Researchers from Carnegie Mellon University have introduced Go-Browse to tackle these limitations through a structured exploration strategy. Rather than relying on generic exploration or static task prompts, Go-Browse treats data collection as a graph traversal problem. It iteratively builds a graph of visited URLs, using this structure to explore both previously discovered and new pages. This allows the agent to reset to known pages and branch out, reducing redundancy while boosting data variety. Each exploration phase proposes and verifies tasks on a selected page, ensuring only feasible tasks generate training data.

How Go-Browse Works: Modular Architecture for Exploration and Validation

Go-Browse operates through multiple modules. The NavExplorer module focuses on proposing navigational tasks that connect to new pages. As a web agent, it interacts dynamically with each page to identify links leading to unexplored URLs. Simultaneously, PageExplorer proposes local tasks for the current page. The FeasibilityChecker module tests these tasks using strong pretrained agents and vision-language models to determine if the proposed actions can be completed successfully. Tasks that pass this step are labeled as feasible and added to the dataset. The Solvers module then samples additional task completions, both from prefixed starting points and from initial states, using lower-cost models to maximize data generation while conserving resources.

WebArena Evaluation: Go-Browse Surpasses Previous Baselines

The research team evaluated Go-Browse on the WebArena benchmark, which is known for its difficulty in evaluating GUI-based agents. They collected a dataset comprising approximately 10,000 successful task trajectories and 17,000 unsuccessful ones across 100 unique URLs. Fine-tuning the Qwen-2.5-7B-Instruct model on this dataset produced a task success rate of 21.7%. This performance exceeded GPT-4o-mini by 2.4% and outperformed the prior best sub-10B parameter model, NNetNav, by 2.9%. Given the baseline human success rate of 78%, this still reflects room for improvement but represents a significant advance.

Why Structured Exploration Boosts Web Agent Intelligence

The research identifies a key issue—digital agents struggle with understanding complex web environments. Their proposed method, Go-Browse, addresses this by implementing a structured yet flexible strategy that combines navigation, task planning, and trajectory validation. By treating exploration as a graph traversal task and using modular verification and sampling, the approach delivers scalable and diverse training data. These contributions yield a measurable performance gain, demonstrating the promise of structured exploration for training more intelligent web agents.

TL;DR:

The paper introduces Go-Browse, a structured exploration framework developed by Carnegie Mellon researchers to improve the training of web-based digital agents. Unlike prior methods, Go-Browse frames exploration as a graph traversal task, enabling scalable and diverse data collection by systematically navigating and interacting with websites. Using modular components like NavExplorer and FeasibilityChecker, it generates high-quality, feasible task trajectories. When evaluated on the WebArena benchmark, Go-Browse-trained models outperformed previous sub-10B models and even surpassed GPT-4o-mini, demonstrating the effectiveness of structured data collection in building robust web agents.

Check out the Paper and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post CMU Researchers Introduce Go-Browse: A Graph-Based Framework for Scalable Web Agent Training appeared first on MarkTechPost.