Explore real reward-guided search trajectories from WebArena-Lite. See how WebArbiter's knockout tournament uses principle-guided reasoning to select optimal actions at each step.
Want the full version? Deploy locally via Docker
The policy model (e.g., GPT-4o-mini) generates 5 candidate actions for the current browser state.
WebArbiter conducts pairwise comparisons using principle-guided reasoning, eliminating weaker candidates in a knockout tournament.
The tournament winner is executed in the browser, and the process repeats at the next step.