Guided Search Explorer

Explore real reward-guided search trajectories from WebArena-Lite. See how WebArbiter's knockout tournament uses principle-guided reasoning to select optimal actions at each step.

Want the full version? Deploy locally via Docker

Sample
Tournament
Execute
How to Use This Explorer
Navigation: Use domain tabs to switch between WebArena websites. Click step dots or use arrow keys to navigate steps. Each step shows the browser screenshot, candidate actions sampled by the policy model, and WebArbiter's principle-guided evaluation.
Tournament: Click any candidate to see how WebArbiter's knockout tournament compared it against alternatives. The green candidate is the tournament winner (executed action).
Screenshot: Click the browser screenshot to view it full-size. Colored bounding boxes highlight the DOM elements targeted by each candidate action.
Tree: The search tree minimap shows the full trajectory structure with the current position highlighted.
Switch domains with tabs, browse trajectories with pills. Use keys or step dots to navigate. Click a candidate to see its reasoning. Hover to see bounding boxes.
Task Intent
Browser State
Browser screenshot
Click to zoom
Candidate Actions
Principle-Guided Evaluation
Search Tree

How Reward-Guided Search Works

1. Sample

The policy model (e.g., GPT-4o-mini) generates 5 candidate actions for the current browser state.

2. Tournament

WebArbiter conducts pairwise comparisons using principle-guided reasoning, eliminating weaker candidates in a knockout tournament.

3. Execute

The tournament winner is executed in the browser, and the process repeats at the next step.