Home
Softono
a

anxuanzi

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Total Products
1

Software by anxuanzi

bua
Open Source

bua

<h1 align="center">BUA - Browser Use Agent for Go</h1> <p align="center"> <strong>πŸ€– Make websites accessible for AI agents. Automate the web with natural language.</strong> </p> <p align="center"> <a href="#installation">Installation</a> β€’ <a href="#quick-start">Quick Start</a> β€’ <a href="#features">Features</a> β€’ <a href="#configuration">Configuration</a> β€’ <a href="#examples">Examples</a> β€’ <a href="#architecture">Architecture</a> </p> <p align="center"> <img src="https://img.shields.io/badge/Go-1.25+-00ADD8?style=flat&logo=go" alt="Go Version"/> <img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="License"/> <img src="https://img.shields.io/badge/LLM-Gemini-4285F4?style=flat&logo=google" alt="Gemini"/> </p> --- ## Why BUA? Traditional browser automation is **fragile**. CSS selectors break. XPaths change. Every website update means rewriting scripts. **BUA changes the game.** Instead of writing brittle selectors, describe what you want in plain English: ```go result, _ := agent.Run(ctx, "Go to Amazon and find the best-rated wireless headphones under $100") ``` The AI agent **sees** the page, **understands** your intent, and **adapts** to any layout. No selectors. No maintenance. Just results. --- ## ✨ What Makes BUA Special | Feature | Traditional Automation | BUA | |--------------------------|-------------------------|----------------------| | **Selector Maintenance** | Constant updates needed | Zero maintenance | | **Dynamic Content** | Complex waits & retries | AI understands state | | **Multi-step Workflows** | Hundreds of lines | One sentence | | **Layout Changes** | Scripts break | Adapts automatically | | **New Sites** | Write new selectors | Works immediately | ### 🎯 Perfect For - **Web Scraping** - Extract data from any site without writing parsers - **Form Automation** - Fill applications, registrations, checkout flows - **E2E Testing** - Test user journeys with natural language - **Data Entry** - Automate repetitive web-based tasks - **Research** - Gather information across multiple sources - **Monitoring** - Track prices, inventory, content changes --- ## πŸš€ Installation ```bash go get github.com/anxuanzi/bua ``` ### Prerequisites - **Go 1.25+** - **Chrome/Chromium** installed on your system - **Gemini API Key** from [Google AI Studio](https://aistudio.google.com/apikey) --- ## ⚑ Quick Start ```go package main import ( "context" "fmt" "log" "os" "time" "github.com/anxuanzi/bua" ) func main() { // Create agent with your Gemini API key agent, err := bua.New(bua.Config{ APIKey: os.Getenv("GEMINI_API_KEY"), Headless: false, // Watch the magic happen Debug: true, }) if err != nil { log.Fatal(err) } defer agent.Close() ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) defer cancel() // Start the browser if err := agent.Start(ctx); err != nil { log.Fatal(err) } // Run a task with natural language result, err := agent.Run(ctx, "Go to Hacker News and find the top 3 stories about AI") if err != nil { log.Fatal(err) } fmt.Printf("βœ… Success: %v\n", result.Success) fmt.Printf("πŸ“Š Steps taken: %d\n", len(result.Steps)) fmt.Printf("⏱️ Duration: %v\n", result.Duration) } ``` **That's it.** The agent navigates to Hacker News, scans the stories, identifies AI-related content, and returns the results. --- ## πŸ› οΈ Features ### 🧠 Intelligent Navigation BUA doesn't just click buttonsβ€”it **understands** web pages: ```go // The agent figures out HOW to accomplish the task agent.Run(ctx, "Find flights from NYC to London for next weekend, sort by price") // Multi-step workflows handled automatically agent.Run(ctx, "Log into my account, go to settings, and change my timezone to PST") ``` ### πŸ‘οΈ Vision-Enabled Screenshots are analyzed by the LLM for visual understanding: ```go cfg := bua.Config{ Preset: bua.PresetQuality, // High-res screenshots // Vision is enabled by default } ``` ### πŸ₯· Stealth Mode Built-in anti-detection measures help avoid bot blocking: - Navigator property spoofing - WebGL fingerprint masking - Plugin emulation - Human-like mouse movements - Random action delays ### πŸ“Έ Screenshot Annotations Visual debugging with element indices overlaid on screenshots: ```go cfg := bua.Config{ ShowAnnotations: true, // See what the AI sees ScreenshotDir: "./debug", } ``` <p align="center"> <img src="https://raw.githubusercontent.com/anxuanzi/bua/main/assets/annotated-screenshot.png" alt="Annotated Screenshot" width="600"/> </p> ### πŸŽ›οΈ Flexible Presets Optimize for speed, cost, or quality: | Preset | Tokens | Screenshot | Best For | |-------------------|--------|------------------|---------------------------| | `PresetFast` | 8K | None (text-only) | Simple tasks, lowest cost | | `PresetEfficient` | 16K | 800px @ 60% | Balanced cost/capability | | `PresetBalanced` | 32K | 1280px @ 75% | **Default** - most tasks | | `PresetQuality` | 64K | 1920px @ 85% | Complex visual tasks | | `PresetMax` | 128K | 2560px @ 95% | Maximum accuracy | ### πŸ” Sensitive Data Protection Automatic redaction of sensitive information in logs: ```go // API keys, passwords, SSNs, credit cards are automatically masked // <secret type="api_key">[REDACTED]</secret> ``` ### πŸ—‚οΈ Tab Management Handle complex multi-tab workflows: ```go // Open comparison shopping tabs tab1, _ := agent.NewTab(ctx, "https://amazon.com") tab2, _ := agent.NewTab(ctx, "https://ebay.com") agent.SwitchTab(tab1) agent.Run(ctx, "Search for 'mechanical keyboard'") agent.SwitchTab(tab2) agent.Run(ctx, "Search for 'mechanical keyboard' and compare prices") ``` ### πŸ’Ύ Session Persistence Save and restore browser sessions: ```go cfg := bua.Config{ ProfileName: "my-shopping-session", ProfileDir: "~/.bua/profiles", // Cookies, localStorage, login state preserved } ``` --- ## βš™οΈ Configuration ### Full Configuration Options ```go cfg := bua.Config{ // Required APIKey: "your-gemini-api-key", // LLM Settings Model: "gemini-2.5-flash", // or "gemini-2.0-flash", etc. // Browser Settings Headless: false, // true for background operation ProfileName: "persistent", // empty = temporary profile ProfileDir: "~/.bua/profiles", Viewport: &bua.Viewport{Width: 1920, Height: 1080}, // Agent Behavior MaxSteps: 100, // Max actions before giving up Preset: bua.PresetBalanced, // Screenshot Settings ScreenshotDir: "./screenshots", ScreenshotMaxWidth: 1280, ScreenshotQuality: 75, TextOnly: false, // true disables screenshots ShowAnnotations: false, // true shows element indices // Visual Feedback ShowHighlight: true, HighlightDurationMs: 300, // Debugging Debug: true, } ``` ### Environment Variables ```bash export GEMINI_API_KEY="your-api-key-here" ``` --- ## πŸ“– Examples ### Example 1: Web Research ```go result, _ := agent.Run(ctx, ` Go to Wikipedia and find information about the Go programming language. Extract the release date, original author, and main features. `) fmt.Println(result.Data) // Extracted information ``` ### Example 2: E-commerce Automation ```go result, _ := agent.Run(ctx, ` Go to Amazon, search for "USB-C hub", filter by 4+ stars, and find the cheapest option with Prime shipping. `) for _, step := range result.Steps { fmt.Printf("[%d] %s: %s\n", step.Number, step.Action, step.NextGoal) } ``` ### Example 3: Form Filling ```go result, _ := agent.Run(ctx, ` Go to the contact form at example.com/contact. Fill in: - Name: John Doe - Email: [email protected] - Message: I'm interested in your services Then submit the form. `) ``` ### Example 4: Multi-Page Workflow ```go // Navigate first agent.Navigate(ctx, "https://github.com/login") // Then automate result, _ := agent.Run(ctx, ` Log in with username 'myuser' and password from the password field. After logging in, go to my repositories and find the most starred one. `) ``` --- ## πŸ—οΈ Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Your Application β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ BUA Public API β”‚ β”‚ bua.New() β†’ Start() β†’ Run() β†’ Close() β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Agent Layer β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ BrowserAgentβ”‚ β”‚ ADK Toolkit β”‚ β”‚ Message Builderβ”‚ β”‚ β”‚ β”‚ (LLM Loop) β”‚ β”‚ (20+ Tools) β”‚ β”‚ (History) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Browser Layer β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Browser β”‚ β”‚ Page β”‚ β”‚ Stealth β”‚ β”‚ β”‚ β”‚ (Lifecycle) β”‚ β”‚ (Actions) β”‚ β”‚ (Evasion) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Support Layer β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ DOM β”‚ β”‚ Screenshot β”‚ β”‚ β”‚ β”‚ (Extraction)β”‚ β”‚ (Annotation) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ go-rod (Chrome DevTools Protocol) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Chrome / Chromium β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Agent Loop ``` Task: "Search for Go tutorials" β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Get Page State │◄─────────────────────┐ β”‚ (DOM + Screenshot) β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β–Ό β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ LLM Reasoning β”‚ β”‚ β”‚ (Gemini + Tools) β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β–Ό β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ Execute Action β”‚ β”‚ β”‚ (click, type, etc) β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β–Ό β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Task Done? │───No───►│ Update State β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚Yes β”‚ β–Ό β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ Return Resultβ”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` --- ## πŸ”§ Available Tools The agent has access to 20+ browser automation tools: | Category | Tools | |-----------------|--------------------------------------------------------------------------| | **Navigation** | `navigate`, `go_back`, `go_forward`, `reload` | | **Interaction** | `click`, `type_text`, `clear_and_type`, `hover`, `double_click`, `focus` | | **Scrolling** | `scroll`, `scroll_to_element` | | **Keyboard** | `send_keys` (Enter, Tab, Escape, etc.) | | **Observation** | `get_page_state`, `screenshot`, `extract_content` | | **JavaScript** | `evaluate_js` | | **Tabs** | `new_tab`, `switch_tab`, `close_tab`, `list_tabs` | | **Completion** | `done` | --- ## πŸ“Š Comparison with Browser-Use (Python) BUA is inspired by the popular [browser-use](https://github.com/browser-use/browser-use) Python library. Here's how they compare: | Aspect | Browser-Use (Python) | BUA (Go) | |--------------------|--------------------------------|--------------------------------------| | **Language** | Python 3.11+ | Go 1.25+ | | **LLM Support** | OpenAI, Claude, Gemini, Ollama | Gemini (via ADK), other models soon. | | **Browser Engine** | Playwright | go-rod (CDP direct) | | **Performance** | Good | Excellent (compiled, no runtime) | | **Deployment** | Python environment | Single binary | | **Memory** | Higher (Python + Node.js) | Lower (native Go) | | **Concurrency** | asyncio | Native goroutines | | **Anti-Detection** | βœ… | βœ… | | **Vision Support** | βœ… | βœ… | | **Custom Tools** | βœ… | βœ… | ### Why Choose BUA? - **πŸš€ Performance**: Go's compiled nature means faster startup and lower memory - **πŸ“¦ Simple Deployment**: Single binary, no Python/Node.js dependencies - **⚑ Concurrency**: Native goroutines for parallel operations - **πŸ”’ Type Safety**: Catch errors at compile time - **🏒 Enterprise Ready**: Common choice for production services --- ## 🀝 Contributing Contributions are welcome! Please feel free to submit a Pull Request. 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request --- ## πŸ“„ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. --- ## πŸ™ Acknowledgments - [browser-use](https://github.com/browser-use/browser-use) - The original Python inspiration - [go-rod](https://github.com/go-rod/rod) - Excellent Go CDP implementation - [Google ADK](https://google.github.io/adk-docs/) - Agent Development Kit for Go - [Google Gemini](https://ai.google.dev/) - Powerful multimodal LLM --- <p align="center"> <strong>Built with ❀️ for the Go community</strong> </p> <p align="center"> <a href="https://github.com/anxuanzi/bua/issues">Report Bug</a> β€’ <a href="https://github.com/anxuanzi/bua/issues">Request Feature</a> </p>

AI Agents Browser Automation
86 Github Stars