agentjido

Open Source

jido_browser

# Jido Browser [![Hex.pm](https://img.shields.io/hexpm/v/jido_browser.svg)](https://hex.pm/packages/jido_browser) [![Hex Docs](https://img.shields.io/badge/hex-docs-lightgreen.svg)](https://hexdocs.pm/jido_browser/) [![CI](https://github.com/agentjido/jido_browser/actions/workflows/ci.yml/badge.svg)](https://github.com/agentjido/jido_browser/actions/workflows/ci.yml) [![License](https://img.shields.io/hexpm/l/jido_browser.svg)](https://github.com/agentjido/jido_browser/blob/main/LICENSE) [![Website](https://img.shields.io/badge/website-jido.run-0f172a.svg)](https://jido.run) [![Ecosystem](https://img.shields.io/badge/ecosystem-jido.run-0ea5e9.svg)](https://jido.run/ecosystem) [![Discord](https://img.shields.io/badge/discord-join-5865F2.svg?logo=discord&logoColor=white)](https://jido.run/discord) Browser automation for Jido AI agents. ## Overview `Jido.Browser` is organized around three simple lanes: - `web_fetch/2` for stateless HTTP-first retrieval - `fetch_rich/2` for agent-friendly retrieval with optional browser fallback - `start_session/1` and `end_session/1` for browser-backed workflows - `Jido.Browser.Pool` plus `start_session(pool: ...)` as an optional acceleration layer `agent-browser` remains the default adapter. `Web` also supports warm pools when you want browser-backed sessions with lower cold-start overhead. `Vibium` remains available without warm-pool support. `Lightpanda` is available as an optional limited adapter for lightweight DOM and JavaScript automation, with warm-pool support for prestarted CDP sessions. The Hex package and OTP app remain `jido_browser`, while the public Elixir namespace is `Jido.Browser.*`. ## Installation Add the dependency: ```elixir def deps do [ {:jido_browser, "~> 2.0"} ] end ``` Install the default browser backend: ```bash mix jido_browser.install ``` That installs the pinned `agent-browser` binary for the current platform and runs `agent-browser install` to provision the browser runtime. ### Recommended Alias Setup ```elixir defp aliases do [ setup: ["deps.get", "jido_browser.install --if-missing"], test: ["jido_browser.install --if-missing", "test"] ] end ``` ### Installing Specific Backends ```bash mix jido_browser.install agent_browser mix jido_browser.install vibium mix jido_browser.install web mix jido_browser.install lightpanda ``` Lightpanda support uses optional dependencies. Add them to applications that select `Jido.Browser.Adapters.Lightpanda`: ```elixir def deps do [ {:jido_browser, "~> 2.0"}, {:light_cdp, "~> 0.2.1"}, {:lightpanda_ex, "~> 0.1.0"} ] end ``` ## Quick Start ```elixir {:ok, session} = Jido.Browser.start_session() {:ok, session, _} = Jido.Browser.navigate(session, "https://example.com") {:ok, session, snapshot} = Jido.Browser.snapshot(session) snapshot["snapshot"] || snapshot[:snapshot] {:ok, session, _} = Jido.Browser.click(session, "@e1") {:ok, _session, %{content: markdown}} = Jido.Browser.extract_content(session, format: :markdown) :ok = Jido.Browser.end_session(session) ``` Selectors remain supported, but ref-based interaction is the preferred 2.0 flow: 1. `snapshot` 2. act on `@eN` refs 3. re-snapshot ### Stateless Web Fetch ```elixir {:ok, result} = Jido.Browser.web_fetch( "https://example.com/docs", format: :markdown, allowed_domains: ["example.com"], focus_terms: ["API", "authentication"], citations: true ) result.content result.passages result.metadata # present when extraction returns document metadata ``` `web_fetch/2` keeps HTML handling native for selector extraction and markdown conversion, and uses `extractous_ex` for fetched binary documents such as PDFs, Word, Excel, PowerPoint, OpenDocument, EPUB, and common email formats. Binary document responses may also include `result.metadata` when extraction returns document metadata. `Req` is the default HTTP backend. `jido_browser` also includes a vendored BrowseyHttp-backed backend when you want a browser-imitating HTTP path for pages that do not require JavaScript execution. Select it globally or per request: ```elixir config :jido_browser, :web_fetch, backend: Jido.Browser.WebFetch.Backends.Browsey, browsey: [ browser: :chrome, timeout: 30_000 ] {:ok, result} = Jido.Browser.web_fetch( "https://example.com/docs", format: :markdown, backend: :browsey, browsey: [browser: :safari] ) ``` BrowseyHttp still does not execute JavaScript. Sites that require a rendered browser should use a browser session instead. Egress also matters: datacenter IP ranges, CI traffic, or too many requests from one IP can still trigger challenges even with browser-like HTTP fingerprints. `web_fetch/2` passes backend-specific `:req` and `:browsey` keyword options from config and runtime opts so applications can supply transport settings without coupling `jido_browser` to a proxy provider. BrowseyHttp is vendored from [s3cur3/browsey_http](https://github.com/s3cur3/browsey_http) under its MIT license because it is not currently published on Hex. The vendored copy keeps `jido_browser` Hex-publishable; if BrowseyHttp is released on Hex, this project should replace the vendored copy with the upstream Hex dependency. ### Agent-Friendly Rich Fetch Use `fetch_rich/2` when an agent needs one retrieval tool that starts with cheap HTTP/document extraction and can fall back to a browser only when explicitly allowed: ```elixir {:ok, result} = Jido.Browser.fetch_rich( "https://example.com/protected-docs", http_backends: [:req, :browsey], browser_fallback: true, pool: :default, citations: true ) result.retrieval_path # :web_fetch, :browsey, or :browser result.blocked? result.content ``` `fetch_rich/2` returns the same core result shape as `web_fetch/2` and adds `retrieval_path`, `fallback_reason`, and `blocked?`. `web_fetch/2` remains stateless and never uses pools. ### State Persistence ```elixir state_path = Path.expand("tmp/browser-state.json") File.mkdir_p!(Path.dirname(state_path)) {:ok, session} = Jido.Browser.start_session() {:ok, session, _} = Jido.Browser.navigate(session, "https://example.com") {:ok, session, _} = Jido.Browser.save_state(session, state_path) :ok = Jido.Browser.end_session(session) {:ok, restored} = Jido.Browser.start_session() {:ok, restored, _} = Jido.Browser.load_state(restored, state_path) ``` ### Tab Workflow ```elixir {:ok, session} = Jido.Browser.start_session() {:ok, session, _} = Jido.Browser.navigate(session, "https://example.com") {:ok, session, _} = Jido.Browser.new_tab(session, "https://example.org") {:ok, session, tabs} = Jido.Browser.list_tabs(session) {:ok, session, _} = Jido.Browser.switch_tab(session, 1) {:ok, session, _} = Jido.Browser.close_tab(session, 1) ``` ### Warm Session Pools Warm pools are explicit and optional. They speed up browser-backed workflows, while `web_fetch/2` stays stateless and never uses pools. For OTP applications, prefer adding a named pool to your supervision tree: ```elixir defmodule MyApp.Application do use Application def start(_type, _args) do children = [ {Jido.Browser.Pool, name: :default, size: 2, headless: true, startup_timeout: 60_000} ] Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Supervisor) end end ``` Then check out pooled sessions by name: ```elixir {:ok, session} = Jido.Browser.start_session( pool: :default, checkout_timeout: 5_000 ) {:ok, session, _} = Jido.Browser.navigate(session, "https://example.com") :ok = Jido.Browser.end_session(session) ``` Use `start_pool/1` for scripts, tests, or ad hoc startup: ```elixir {:ok, _pool} = Jido.Browser.start_pool( name: :default, size: 2, headless: true ) {:ok, session} = Jido.Browser.start_session( pool: :default, checkout_timeout: 5_000 ) {:ok, session, _} = Jido.Browser.navigate(session, "https://example.com") :ok = Jido.Browser.end_session(session) ``` Warm pools are currently supported by `Jido.Browser.Adapters.AgentBrowser`, `Jido.Browser.Adapters.Lightpanda`, and `Jido.Browser.Adapters.Web`. - AgentBrowser pools keep full warm daemon-backed sessions ready for checkout. - Lightpanda pools keep prestarted Lightpanda/CDP sessions ready for checkout. - Web pools keep reserved warmed profiles ready for checkout. - `lifecycle: :ephemeral` is the default: `end_session/1` recycles the checked-out worker and warms a replacement in the background. - `lifecycle: :persistent` returns healthy workers to the pool after normal `end_session/1`; owner crashes, failed health checks, `max_uses`, and `max_age_ms` still recycle workers. Inspect a pool with: ```elixir {:ok, status} = Jido.Browser.pool_status(:default) status.ready status.leased status.lifecycle ``` For the `Web` adapter, pooled sessions are still browser sessions, not HTTP fetches. Use `web_fetch/2` when you want the simplest request/response API without browser state. Persistent pools can preserve browser profile continuity, cookies, storage, and session history for application-managed workflows. They do not guarantee access through bot filters; egress, traffic rate, target-site policy, and user-provided state remain application concerns. ### Plugin Setup ```elixir defmodule MyBrowsingAgent do use Jido.Agent, name: "browser_agent", plugins: [ {Jido.Browser.Plugin, [ adapter: Jido.Browser.Adapters.AgentBrowser, pool: :default, checkout_timeout: 5_000, headless: true, timeout: 30_000 ]} ] end ``` ## Configuration ```elixir config :jido_browser, adapter: Jido.Browser.Adapters.AgentBrowser config :jido_browser, :agent_browser, binary_path: "/usr/local/bin/agent-browser", headed: false ``` Other adapters can still be configured explicitly: ```elixir config :jido_browser, :vibium, binary_path: "/path/to/vibium" config :jido_browser, :web, binary_path: "/usr/local/bin/web", profile: "default" config :jido_browser, :lightpanda, binary_path: "/usr/local/bin/lightpanda", disable_telemetry: true ``` Optional web fetch settings: ```elixir config :jido_browser, :web_fetch, backend: Jido.Browser.WebFetch.Backends.Req, cache_ttl_ms: 300_000, req: [ connect_options: [ timeout: 10_000 ] ], extractous: [ pdf: [extract_annotation_text: true], office: [include_headers_and_footers: true] ] ``` Configured `req`, `browsey`, and `extractous` options are merged with any per-call options passed to `Jido.Browser.web_fetch/2`. ## Backends ### AgentBrowser (Default) - native snapshot support with refs - supervised daemon per session - optional warm session pools with explicit checkout - direct JSON IPC from Elixir - built-in state save/load and tab management support ### Lightpanda (Limited) - optional adapter backed by `light_cdp` - supports session lifecycle, navigation, click, type, PNG screenshots, content extraction, and JavaScript evaluation - supports warm pools for prestarted Lightpanda/CDP sessions - uses `lightpanda_ex` for pinned Lightpanda binary installation - disables Lightpanda telemetry by default with `LIGHTPANDA_DISABLE_TELEMETRY=true` - does not provide AgentBrowser-native refs, state persistence, tab management, or console capture ### Vibium (Legacy) - retained for transitional compatibility - feature-frozen in 2.0 ### Web (Legacy) - retained for transitional compatibility - feature-frozen in 2.0 ## Public API Core operations: - `start_pool/1` - `stop_pool/1` - `start_session/1` - `end_session/1` - `navigate/3` - `click/3` - `type/4` - `screenshot/2` - `extract_content/2` - `web_fetch/2` - `evaluate/3` Agent-browser-native operations: - `snapshot/2` - `wait_for_selector/3` - `wait_for_navigation/2` - `query/3` - `get_text/3` - `get_attribute/4` - `is_visible/3` - `save_state/3` - `load_state/3` - `list_tabs/2` - `new_tab/3` - `switch_tab/3` - `close_tab/3` - `console/2` - `errors/2` ## Available Actions ### Session - `StartSession` - `EndSession` - `GetStatus` - `SaveState` - `LoadState` ### Navigation - `Navigate` - `Back` - `Forward` - `Reload` - `GetUrl` - `GetTitle` ### Interaction - `Click` - `Type` - `Hover` - `Focus` - `Scroll` - `SelectOption` ### Waiting and Queries - `Wait` - `WaitForSelector` - `WaitForNavigation` - `Query` - `GetText` - `GetAttribute` - `IsVisible` ### Content and Diagnostics - `Snapshot` - `Screenshot` - `ExtractContent` - `Console` - `Errors` ### Tabs - `ListTabs` - `NewTab` - `SwitchTab` - `CloseTab` ### Advanced and Composite - `Evaluate` - `ReadPage` - `SnapshotUrl` - `SearchWeb` - `WebFetch` ## Using With Jido Agents ```elixir defmodule MyBrowsingAgent do use Jido.Agent, name: "web_browser", description: "An agent that can browse the web", plugins: [{Jido.Browser.Plugin, [headless: true]}] end ``` `Jido.Browser.Plugin` now exposes 38 browser actions, including snapshot/refs workflows, browser state actions, diagnostics, tab management, and stateless web fetch. ## License Apache-2.0 - See [LICENSE](LICENSE) for details.

AI Agents Browser Automation

13 Github Stars

Software by agentjido

jido_browser