Guided Search Explorer

Explore real reward-guided search trajectories from WebArena-Lite. See how WebArbiter's knockout tournament uses principle-guided reasoning to select optimal actions at each step.

Sample

Tournament

Execute

Navigation: Use domain tabs to switch between WebArena websites. Click step dots or use arrow keys to navigate steps. Each step shows the browser screenshot, candidate actions sampled by the policy model, and WebArbiter's principle-guided evaluation.
Tournament: Click any candidate to see how WebArbiter's knockout tournament compared it against alternatives. The green candidate is the tournament winner (executed action).
Screenshot: Click the browser screenshot to view it full-size. Colored bounding boxes highlight the DOM elements targeted by each candidate action.
Tree: The search tree minimap shows the full trajectory structure with the current position highlighted.

How Reward-Guided Search Works

1. Sample

The policy model (e.g., GPT-4o-mini) generates 5 candidate actions for the current browser state.

2. Tournament

WebArbiter conducts pairwise comparisons using principle-guided reasoning, eliminating weaker candidates in a knockout tournament.

3. Execute

The tournament winner is executed in the browser, and the process repeats at the next step.