Agent Browser Workspace
Local deep research + browser use for any agent.
Run everything on your machine with a real Chrome profile (CDP) — no Docker, no remote browsers, no platform lock-in.
This repo is a small, production-ready toolkit for agent-driven browsing and reproducible deep research: control Chrome, extract content/forms, save pages to Markdown (with images), and automate Google Search — all as CLI scripts and Node.js APIs.
“Local-first” here means: your browsing session lives in an isolated Chrome profile on your computer.
CLI-first. For agent tasks, prefer running the existing tools via CLI (
node scripts/...,node utils/...). Avoid writing new JavaScript scripts; only do it as a last resort when the CLI tools cannot perform a required step.
Why it competes
- Local-first browser automation: real Google Chrome + isolated profile (
AgentProfile) + CDP. - Extensible by design: add/maintain site-specific selectors & controls via
scripts/sites/*.json(no hardcoding in prompts or code). - Reproducible deep research workflow: stable SERP snapshots (
links.json), downloaded sources in Markdown, and a disciplined research loop. - Cost-efficient deep research: strong DeepResearch Bench results with a much cheaper model tier.
Features
- Browser control (low level):
utils/browserUse.jslaunchCDP(recommended),launch(persistent),connectCDP- navigation, clicks, form fill, scrolling (incl. infinite scroll), screenshots
- download images/files, evaluate JS, basic network tools
- PDF auto-handling: extract text even when Chrome PDF extensions intercept navigation
- HTML → structured data (no browser):
utils/getDataFromText.js- extract navigation blocks, main content blocks (with Markdown), and forms (classified)
- High-level scripts (CLI + API):
scripts/getContent.js— save current/target page to Markdown + download images + rewrite linksscripts/getForms.js— detect forms and generate ready-to-use CSS selectors for fieldsscripts/getAll.js— content + forms in one call (single HTML snapshot)scripts/googleSearch.js— step-by-step Google Search: search → collect links → open → extract → paginate
- Site profiles (
scripts/sites/*.json)- central place to maintain selectors and “controls” (actionable UI elements)
- scripts return a
sitefield so an agent can reuse selectors without guessing
DeepResearch Bench results (cost-efficient)
We use DeepResearch Bench (DRB) as the main external benchmark for deep research agents (official site, repo, leaderboard).
As measured on DRB (RACE overall), the reference points are:
| System | DRB overall |
|---|---|
| Gemini-2.5-Pro Deep Research | 48.88 |
| OpenAI Deep Research | 46.98 |
| Agent Browser Workspace (this repo) | 44.37 (submission pending review) |
| Perplexity Deep Research | 42.25 |
Important context:
- Our 44.37 result was produced using Claude Haiku 4.5 (significantly cheaper than frontier “deep research” stacks) and a Cursor-based agent. Results were submitted to the leaderboard and are currently under review.
- Claude Haiku 4.5 is designed for low-latency, cost-effective agentic workloads (see Anthropic announcement).
Install (local)
Requirements
- Node.js 18+ (20+ recommended)
- Google Chrome (desktop)
Setup
npm install
npx playwright install chrome
Start/stop background Chrome (recommended)
npm run chrome
Stop:
npm run chrome:stop
For full setup details (profiles, Windows shortcut, verification), see INSTALLATION.md.
Quickstart (CLI)
1) Save a page to Markdown (with images)
node scripts/getContent.js --url https://example.com --dir ./output --name page.md
2) Extract content + forms in one pass
node scripts/getAll.js --url https://example.com --dir ./output --name page.md --forms-output ./output/forms.json
3) Deep research loop via Google Search
# Save a stable SERP snapshot (links.json)
node scripts/googleSearch.js "playwright tutorial" --links --dir ./archive/my-research
# Open a result and save content
node scripts/googleSearch.js "playwright tutorial" --open 0 --dir ./archive/my-research --name article.md
links.json is intentionally stable: you can resume research later without re-running Google Search.
Usage (advanced: Node.js API)
The Node.js API exists primarily for authoring/extending tools. For normal tasks, prefer the CLI quickstart above.
- Low-level browser control: see
utils/browserUse.md - Google Search + content extraction: see
scripts/googleSearch.md
Extending: add a site profile
Site profiles live in scripts/sites/*.json. A profile can define:
scraping.selectors— named selectors used by scriptscontrols.items— “UI controls” (selectorKey + human-readable actions) that scripts expose back assite.controls
Minimal example:
{
"id": "my-site",
"name": "My Site",
"hosts": ["example.com", "www.example.com"],
"scraping": {
"selectors": {
"searchInput": "input[name=\"q\"]"
}
},
"controls": {
"items": [
{
"name": "Search input",
"selectorKey": "searchInput",
"description": "Main search field",
"actions": ["fill", "press:Enter"]
}
]
}
}
When the page host matches a profile, scripts (getContent, getForms, getAll) return:
site: { id, name, host, controls[] } — so your agent can act on the page without guessing selectors.
Important: single-threaded browser access
Chrome/profile is a shared resource. Use browser tasks sequentially, from a single agent/process at a time.
Parallelize only non-browser work (e.g., parsing raw HTML via getDataFromText).
Docs
- AGENT_BROWSER.md — tools overview, usage rules, escalation strategy for JS-heavy pages, PDF handling
- INSTALLATION.md — setup, Chrome profile, QoL shortcuts
- RESEARCH.md — deep research methodology (artifacts, phases, quality checklist)
scripts/— high-level scripts documentation:utils/— low-level utilities documentation:
Contributing
PRs are welcome — especially:
- new/updated site profiles in
scripts/sites/ - extractor improvements for tricky pages (SPA, lazy-load, paywalls/overlays)
- better heuristics for form classification and content block selection
License
Apache-2.0. See LICENSE.