About BrowserKing

Open-source browser agent Chrome extension — use any LLM (GPT-4o, Claude, Gemini, Grok, DeepSeek, Llama, and more) to control your browser

m

Published by

mushisushi28

Visit View Profile

README.md

View on GitHub

BrowserKing

Open-source browser agent Chrome extension powered by any LLM provider

Use GPT-4o, Claude, Gemini, Grok, DeepSeek, Mistral, Llama, and 15+ other models as your browser agent — all from one extension.

What is BrowserKing?

BrowserKing is a Chrome extension that gives any LLM the ability to see and control your browser. It works like a human assistant that can:

Take screenshots and understand what's on screen
Click, type, scroll and navigate web pages
Read page content and extract information
Open tabs and work across multiple pages
Record and replay workflows for repetitive tasks

Unlike other browser agents locked to a single provider, BrowserKing lets you bring your own API key from any OpenAI-compatible provider — or use the Anthropic API directly.

Supported Providers

Provider	Models	Vision
OpenAI	GPT-4o, GPT-4.1, o3, o4-mini	Yes
Anthropic	Claude 4 Sonnet, Claude 4 Opus	Yes
Google	Gemini 2.5 Pro/Flash	Yes
xAI	Grok 3, Grok 3 Mini	Yes
DeepSeek	DeepSeek-V3, DeepSeek-R1	No
Mistral	Mistral Large, Pixtral Large	Pixtral only
Groq	Llama 4 Scout/Maverick, Llama 3.3	Scout/Maverick
Cerebras	Llama 3.3 70B	No
Perplexity	Sonar Pro, Sonar Huge	No
Z.AI	GLM-4.6V, GLM-4.5V	Yes
Custom	Any OpenAI-compatible endpoint	Configurable

Each provider gets its own theme color throughout the UI — the sidebar, send button, glow effects, and page border all match your active provider.

Run Local Models

BrowserKing works with any OpenAI-compatible API, which means you can run it with local models too. Point it at Ollama, LM Studio, LocalAI, or any other local inference server that exposes an OpenAI-compatible endpoint.

Setup:

Start your local server (e.g., ollama serve or launch LM Studio)
In BrowserKing settings, select the Custom provider
Set the base URL to your local endpoint (e.g., http://localhost:11434/v1 for Ollama)
Enter any string as the API key (local servers usually don't require one, but the field can't be empty)
Select or type your model name

Best models for local browser agents:

llama-4-scout or llama-4-maverick (vision + tool use)
qwen2.5-vl (strong vision)
Any model with vision and function calling support

Note: Local model support is experimental. Performance depends heavily on your hardware and the model's ability to handle tool calls and vision inputs. Cloud providers with dedicated function calling support will generally give the most reliable results. If you run into issues, please open an issue — we'd love to hear what works and what doesn't.

Features

Multi-provider support — Switch between providers and models mid-conversation from the dropdown
Automatic vision detection — Knows which models support images and which don't, with safe fallback
Provider-themed UI — Every color in the extension adapts to your active provider
Page glow border — Pulsing colored border around the page while the agent is working
No account required — No sign-up, no subscription. Just add your API key and go
Workflow recording — Record browser actions and replay them later
GIF export — Export recordings as GIFs
Tool use — Models that support function calling get collapsible tool-use blocks in the chat

Installation

Download or clone this repo
Open chrome://extensions in Chrome
Enable Developer mode (toggle in the top right)
Click Load unpacked and select this folder
Click the BrowserKing icon in your toolbar to open the side panel
Go to Settings and add your API key for at least one provider

Quick Start

Open any web page
Open BrowserKing from the side panel
Select your provider and model from the dropdown at the top
Type a task like "Find the cheapest flight from Calgary to Tokyo next month"
Watch the agent work — it takes screenshots, clicks, scrolls, and reports back

Configuration

Open the Provider Settings page from the extension options to:

Enable/disable providers
Enter API keys
Set custom base URLs (for self-hosted or proxy endpoints)
Choose default models
Fetch available models from provider APIs

Architecture

BrowserKing works by intercepting the stock extension's Anthropic API calls and translating them to OpenAI-compatible chat/completions requests. This means:

The full browser automation toolkit (screenshots, clicks, navigation) works with any provider
Tool calls are translated between Anthropic and OpenAI formats in real-time
SSE streaming is translated on the fly
Vision payloads are automatically downgraded for text-only models

Key files:

File	Purpose
`provider-registry.js`	Provider definitions, model lists, vision detection
`api-adapter.js`	API translation layer (Anthropic ↔ OpenAI)
`ui-branding.js`	Dynamic theme colors in the side panel
`brand-overlay.js`	Page glow border and stop button theming
`sidepanel-provider-menu.js`	Provider/model dropdown UI
`provider-settings.js`	Settings page for API keys and configuration

Roadmap

[ ] Persistent page glow — Fix the colored glow border so it reliably pulses for all providers during agent activity
[ ] Conversation history — Save and resume past agent sessions
[ ] Multi-tab workflows — Coordinate actions across multiple tabs in a single task
[ ] Prompt templates — Pre-built task templates for common workflows (form filling, data extraction, price comparison)
[ ] Better local model support — Improve tool call translation for models with non-standard function calling formats
[ ] Export to Playwright/Puppeteer — Convert recorded workflows to automation scripts
[ ] Provider cost tracking — Track token usage and estimated cost per conversation
[ ] Firefox & Edge support — Port the extension to other Chromium and non-Chromium browsers
[ ] Mobile support — Bring browser agent capabilities to mobile browsers via Kiwi Browser or Firefox Android extensions

Have a feature idea? Open an issue or submit a PR.

Acknowledgments

BrowserKing is built on top of Anthropic's Claude for Chrome extension. We extended it with multi-provider support, local model compatibility, and a provider-themed UI.

License

MIT

Contributing

PRs welcome. If you add a new provider, add it to the PROVIDERS object in provider-registry.js with:

id, label, color
baseUrl and model list
Vision support flags per model

Then add vision detection rules in inferVisionSupport() in the same file.

BrowserKing

About BrowserKing

Platforms

Languages

Links

README.md

BrowserKing

What is BrowserKing?

Supported Providers

Run Local Models

Features

Installation

Quick Start

Configuration

Architecture

Roadmap

Acknowledgments

License

Contributing