About leaderboard

Open leaderboard for browser agents

s

Published by

README.md

AI Browser Agent Leaderboards

Open reference for evaluating AI browser agents, computer-use systems, and coding agents across the public benchmarks teams actually compare on.

Steel.dev — Open-source Browser API for AI Agents & Apps

Live site: leaderboard.steel.dev — full rankings, methodology notes, and per-result detail.

Maintained by Steel, the open-source browser API for AI agents.

Top results by benchmark

The tables below show the current top entries on each tracked benchmark. Each section links to the full leaderboard with sources, methodology, and additional context.

Rank	System	Organization	Score
1	Alumnium (new)	Alumnium	98.5%
2	Surfer 2	H Company	97.1%
3	Magnitude	Magnitude	93.9%
4	Surfer-H + Holo1	H Company	92.2%
5	Browserable	Browserable	90.4%

Rank	System	Organization	Score
1	GPT-5.5 Pro	OpenAI	90.1%
2	GPT-5.4 Pro	OpenAI	89.3%
3	MiroThinker-H1	MiroMind	88.2%
4	Claude Mythos Preview	Anthropic	86.9%
5	Kimi K2.6	Moonshot AI	86.3%

Rank	System	Organization	Score
1	WebTactix (DeepSeek v3.2)	WebTactix	74.3%
2	OpAgent	CodeFuse AI	71.6%
3	ColorBrowserAgent	MadeAgents	71.2%
4	Claude Code + GBOX MCP	GBOX AI	68.0%
5	DeepSky Agent	DeepSky	66.9%

Rank	System	Organization	Score
1	Claude Mythos	Anthropic	93.9%
2	Claude Opus 4.8 (new)	Anthropic	88.6%
3	Claude Opus 4.7	Anthropic	87.6%
4	Claude Opus 4.5	Anthropic	80.9%
5	Claude Opus 4.6	Anthropic	80.8%

Rank	System	Organization	Score
1	Claude Opus 4.8 (new)	Anthropic	83.4%
2	Mythos Preview (new)	Anthropic	79.6%
3	OSAgent	TheAGI Company	76.26%
4	GPT-5.4 (new)	OpenAI	75.0%
5	Claude Opus 4.6	Anthropic	72.7%

Rank	System	Organization	Score
1	OPS-Agentic-Search (new)	Alibaba Cloud	92.36%
1	openJiuwen-deepagent (new)	Suzhou AI Lab / Shuqian Tech	92.36%
3	openJiuwen-deepagent (GPT5/Gemini)	openJiuwen	91.69%
4	Lemon Agent	Lenovo CTO Org	91.36%
5	JoinAI V2.2	JoinAI-CMCC	90.7%

Rank	System	Organization	Score
1	Claude Sonnet 4.6	Anthropic	33.3%
2	GLM-5 (new)	Z.ai	24.2%
3	Gemini 3 Flash	Google	19.0%
4	Claude Haiku 4.5	Anthropic	18.3%
5	GPT-5.4	OpenAI	6.5%

Rank	System	Organization	Score
1	Browser Use Cloud (bu-max) (new)	Browser-Use	97.0%
2	GPT-5.4 Native Computer Use	OpenAI	93.0%
3	ABP + Claude Opus 4.6	theredsix	90.53%
4	TinyFish	TinyFish AI	90.0%
5	UI-TARS-2	ByteDance / VLM-Research	88.2%

Rank	System	Organization	Score
1	Step-3.5-Flash	StepFun	88.2%
2	GLM-4.7	Z.ai	87.4%
3	MiMo-V2-Flash	Xiaomi	80.3%
4	GLM-4.7-Flash	Z.ai	79.5%
5	MiniMax M2	MiniMax	77.2%

Rank	System	Organization	Score
1	AgentRL w/ Qwen2.5-32B-Instruct	Tsinghua University	70.4%
2	AgentRL w/ Qwen2.5-14B-Instruct	Tsinghua University	67.7%
3	AgentRL w/ GLM-4-9B-0414	Tsinghua University	65.0%
4	AgentRL w/ Qwen2.5-7B-Instruct	Tsinghua University	62.0%
5	AgentRL w/ Qwen2.5-3B-Instruct	Tsinghua University	60.0%

How to read these tables

Within-benchmark only. Scores measure different things on different benchmarks; don't compare a WebVoyager number to a SWE-bench Verified number.
Source-linked. Each score links to the original report — paper, blog post, or official leaderboard. Treat anything self-reported with the appropriate dose of skepticism.
Scope matters. Agent pages reflect full system setups (model + tools + policy). Model pages emphasize base model capability under a stated harness. Mixed pages combine both — read the per-row notes on the site before drawing conclusions.

Contributing

We welcome new entries, corrections, methodology notes, and new benchmark pages. See CONTRIBUTING.md for the evidence standard, JSON schema expectations, ranking rules, and new-leaderboard checklist.

At minimum, every submitted score needs a public source URL that directly supports the benchmark, system name, score, and setup notes. If you update leaderboard data, run npm run update-readme before opening a pull request.

License

MIT

leaderboard

About leaderboard

Platforms

Languages

Links

README.md

AI Browser Agent Leaderboards

Top results by benchmark

WebVoyager

BrowseComp

WebArena

SWE-bench Verified

OSWorld

GAIA

ClawBench

Online-Mind2Web

τ-bench

AgentBench

How to read these tables

Contributing

License