firecrawl

Open Source

open-scouts

# Open Scouts Create AI scouts that continuously search the web and notify you when they find what you're looking for. ![open-scouts_4](https://github.com/user-attachments/assets/a1ff82ef-97e4-469b-9712-99d0367755a7) ## About Open Scouts is an AI-powered monitoring platform that lets you create "scouts" - automated tasks that run on a schedule to continuously search for and track information. Whether you're looking for new restaurants near you, monitoring AI news, or tracking any other updates, scouts work 24/7 to find what you need and notify you when they discover it. ## Tech Stack - **Next.js 15** - **React 19** - **TypeScript** - **Tailwind CSS v4** - **Supabase** (Database + Auth + Edge Functions) - **pgvector** (Vector embeddings for semantic search) - **Firecrawl SDK** (@mendable/firecrawl-js) - **OpenAI API** (AI Agent + Embeddings) - **Resend** (Email Notifications) ## Getting Started ### Prerequisites - Node.js 18+ - bun (default), npm, or pnpm - Supabase account ([supabase.com](https://supabase.com)) - OpenAI API key ([platform.openai.com](https://platform.openai.com)) - Firecrawl API key ([firecrawl.dev](https://firecrawl.dev)) - Resend API key ([resend.com](https://resend.com)) - for email notifications - Google Cloud Console account (for Google OAuth - optional) ### 1. Clone and Install ```bash git clone https://github.com/firecrawl/open-scouts cd open-scouts bun install # or: npm install / pnpm install ``` ### 2. Create Supabase Project 1. Go to [supabase.com](https://supabase.com/dashboard) 2. Create a new project 3. Wait for the project to finish provisioning ### 3. Enable Required Extensions In your Supabase Dashboard: 1. Go to **Database → Extensions** 2. Search for and enable: - `pg_cron` (for scheduled jobs) - `pg_net` (for HTTP requests from database) - `vector` (for AI-powered semantic search on execution summaries) - `supabase_vault` (for secure credential storage - usually enabled by default) ### 4. Set Up Environment Variables Create a `.env` file in the root directory by copying the example file: ```bash cp .env.example .env ``` Then fill in your actual values in the `.env` file. **The `.env.example` file contains all required environment variables with detailed instructions and direct links for where to obtain each API key.** ### 5. Run Database Setup First, link your Supabase project (required for syncing secrets): ```bash bunx supabase login # Login to Supabase CLI (one-time) bunx supabase link --project-ref <your-project-ref> # Find ref in Supabase Dashboard URL ``` Then run the setup script: ```bash bun run setup:db # or: npm run setup:db / pnpm run setup:db ``` This will: - Create all required tables (`scouts`, `scout_executions`, `scout_execution_steps`, etc.) - Add user authentication support (user_id columns, Row Level Security) - Enable real-time subscriptions - Set up vector embeddings for AI-generated execution summaries - Configure the **scalable dispatcher architecture** (pg_cron + pg_net + vault) - Automatically store your Supabase URL and service role key in the vault - Set up cron jobs for scout dispatching and cleanup - **Sync Edge Function secrets** from your `.env` file (OPENAI_API_KEY, FIRECRAWL_API_KEY, RESEND_API_KEY) **Note:** The setup script will check if the required extensions (`vector`, `pg_cron`, `pg_net`) are enabled. If not, follow the on-screen instructions to enable them in the Supabase Dashboard, then run the script again. ### 6. Set Up Authentication Open Scouts uses Supabase Auth for user authentication, supporting both email/password and Google OAuth. #### Enable Email/Password Auth (Enabled by Default) 1. Go to Supabase Dashboard → **Authentication** → **Providers** → **Email** 2. Ensure "Enable Email Provider" is toggled on 3. Configure email templates as needed in **Authentication** → **Email Templates** #### Enable Google OAuth (Optional but Recommended) 1. **Create Google OAuth Credentials:** - Go to [Google Cloud Console](https://console.cloud.google.com/) - Create a new project or select existing one - Navigate to **APIs & Services** → **Credentials** - Click **Create Credentials** → **OAuth client ID** - Choose "Web application" as Application type - Add authorized JavaScript origins: - `http://localhost:3000` (development) - `https://your-domain.com` (production) - Add authorized redirect URIs: - `https://<your-project-ref>.supabase.co/auth/v1/callback` - Copy the **Client ID** and **Client Secret** 2. **Configure in Supabase:** - Go to Supabase Dashboard → **Authentication** → **Providers** → **Google** - Toggle "Enable Google Provider" - Paste your Client ID and Client Secret - Save ### 7. Deploy Edge Functions Deploy the scout execution agent and email functions to Supabase Cloud: ```bash bunx supabase functions deploy scout-cron bunx supabase functions deploy send-test-email ``` **Note:** Secrets (OPENAI_API_KEY, FIRECRAWL_API_KEY, RESEND_API_KEY) are automatically synced when you run `setup:db`. If you need to update them manually: ```bash bunx supabase secrets set OPENAI_API_KEY=sk-proj-... ``` ### 8. Set Up Resend (Email Notifications) Email notifications are sent to your account email when scouts find results. 1. **Create a Resend account** at [resend.com](https://resend.com) 2. **Get your API key** from the Resend dashboard 3. **Add to `.env`** and run `setup:db` again to sync, or set manually: ```bash bunx supabase secrets set RESEND_API_KEY=re_... ``` 4. **Verify a custom domain** at [resend.com/domains](https://resend.com/domains) to send to any email **Important - Free Tier Limitations:** - Without a verified domain, Resend only sends to your Resend account email - Free tier includes 3,000 emails/month (100/day limit) **Testing Email Setup:** 1. Go to **Settings** in the app 2. Click **Send Test Email** to verify the configuration 3. Check your inbox for the test email ### 9. Firecrawl Configuration Open Scouts uses [Firecrawl](https://firecrawl.dev) for web scraping and search. > **📌 Important: No Environment Variable Configuration Required** > > **You do NOT need to configure a Firecrawl API key in your environment variables.** Each user can simply add their own custom Firecrawl API key directly in the **Settings** page within the app. This is the recommended approach for most users. #### Custom API Key (Recommended) The simplest way to use Open Scouts: 1. Sign up at [firecrawl.dev](https://firecrawl.dev) 2. Get your API key from the [dashboard](https://www.firecrawl.dev/app/api-keys) 3. Go to **Settings** in the Open Scouts app 4. Enter your Firecrawl API key in the **Firecrawl Integration** section 5. Click **Save** - you're ready to go! Each user manages their own API key, and usage is tracked to their individual Firecrawl account. #### Server-Side API Key (Optional - For Self-Hosting) If you're self-hosting and want to provide a shared API key for all users: 1. Sign up at [firecrawl.dev](https://firecrawl.dev) 2. Get your API key from the [dashboard](https://www.firecrawl.dev/app/api-keys) 3. Add to your `.env` file: ```bash FIRECRAWL_API_KEY=fc-your-key-here ``` 4. Set the edge function secret: ```bash npx supabase secrets set FIRECRAWL_API_KEY=fc-your-key-here ``` **Note:** Users who add their own custom API key in Settings will use their personal key instead of the server-side shared key. #### Partner Integration (Enterprise - Closed Beta) > **⚠️ The Partner Key feature is currently in closed beta and only available for enterprise customers.** This feature is not publicly available. **How Partner Integration Works (when available):** - When users sign up, a unique Firecrawl API key is automatically created for them - Each user's usage is tracked separately - Keys are stored securely in the `user_preferences` table - If a user's key fails, the system automatically falls back to the shared partner key - Users can view their connection status in **Settings → Firecrawl Integration** ### 10. Run the Development Server ```bash bun run dev # or: npm run dev / pnpm run dev ``` Open [http://localhost:3000](http://localhost:3000) to see the app. ## How It Works ### User Authentication Flow 1. **Public Home Page**: Users can browse the landing page without signing in 2. **Create Scout**: When a user types a query and hits Enter, they're prompted to sign in 3. **Sign In/Sign Up**: Users can authenticate via email/password or Google OAuth 4. **Continue Flow**: After authentication, the scout creation continues automatically 5. **User Isolation**: Each user only sees and manages their own scouts ### Scout System 1. **Create a Scout**: Define what you want to monitor (e.g., "Scout for any recent Indian restaurants near me" or "Scout for any AI news") 2. **AI Agent Setup**: The system automatically configures search queries and strategies 3. **Set Frequency**: Choose how often to run (hourly, every 3 days, weekly) 4. **Configure Notifications**: Add your email in Settings to receive alerts when scouts find results 5. **Continuous Monitoring**: The dispatcher checks every minute and triggers due scouts individually 6. **AI Summaries**: Each successful execution generates a concise one-sentence summary with semantic embeddings 7. **Get Notified**: Receive email alerts when scouts find new results (if email is configured) 8. **View Results**: See all findings with AI-generated summaries in real-time on the scout page ### Manual Execution Click the **"Run Now"** button on any scout page to trigger execution immediately without waiting for the cron. ### Email Notifications When scouts find results, you'll automatically receive email alerts at your account email: - **Automatic**: Emails are sent only when scouts successfully find results - **Rich Content**: Beautiful HTML emails with scout results and links - **Test**: Use the "Send Test Email" button in Settings to verify setup **Email Service**: Powered by Resend (free tier includes 3,000 emails/month) **Note:** On Resend's free tier without a verified domain, emails can only be sent to your Resend account email. Verify a custom domain at [resend.com/domains](https://resend.com/domains) to send to any email. ### Architecture - **Frontend**: Next.js app with real-time updates via Supabase Realtime - **Database**: PostgreSQL (Supabase) with pg_cron for scheduling and pgvector for semantic search - **Authentication**: Supabase Auth (Email/Password + Google OAuth) - **AI Agent**: OpenAI GPT-4 with function calling (search & scrape tools) - **AI Summaries**: Auto-generated one-sentence summaries with vector embeddings for each successful execution - **Edge Function**: Deno-based serverless function that orchestrates agent execution - **Web Scraping**: Firecrawl API for search and content extraction (supports per-user API keys via partner integration) #### Scalable Dispatcher Architecture Open Scouts uses a dispatcher pattern designed to scale to thousands of scouts: ``` Every minute: pg_cron → dispatch_due_scouts() → finds due scouts → pg_net HTTP POST ↓ ┌──────────────────────┼──────────────────────┐ ↓ ↓ ↓ Edge Function Edge Function Edge Function (scout A) (scout B) (scout C) [isolated] [isolated] [isolated] ``` - **Dispatcher (SQL)**: Runs every minute via pg_cron, queries for due scouts, and fires individual HTTP requests - **Isolated Execution**: Each scout runs in its own edge function invocation with full resources (256MB memory, 400s timeout) - **Automatic Cleanup**: A separate cron job cleans up stuck executions every 5 minutes - **Vault Integration**: Supabase credentials are securely stored in the vault and read by the dispatcher ## Security - **Row Level Security (RLS)**: All database tables have RLS policies ensuring users can only access their own data - **User Isolation**: Scouts, messages, and executions are all tied to authenticated users - **Secure Auth Flow**: OAuth tokens and sessions are managed by Supabase Auth - **Service Role**: Server-side operations (cron jobs, edge functions) use service role for privileged access - **API Key Storage**: Firecrawl API keys (when using partner integration) are stored server-side in `user_preferences` and never exposed to the client ## Build for Production ```bash bun run build # or: npm run build / pnpm run build bun start # or: npm start / pnpm start ``` ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## License MIT

AI & Machine Learning Automation AI Agents

1.3K Github Stars

Open Source

<h3 align="center"> <a name="readme-top"></a> <img src="https://raw.githubusercontent.com/firecrawl/firecrawl/main/img/firecrawl_logo.png" height="200" > </h3> <div align="center"> <a href="https://github.com/firecrawl/firecrawl/blob/main/LICENSE"> <img src="https://img.shields.io/github/license/firecrawl/firecrawl" alt="License"> </a> <a href="https://pepy.tech/project/firecrawl-py"> <img src="https://static.pepy.tech/badge/firecrawl-py" alt="Downloads"> </a> <a href="https://GitHub.com/firecrawl/firecrawl/graphs/contributors"> <img src="https://img.shields.io/github/contributors/firecrawl/firecrawl.svg" alt="GitHub Contributors"> </a> <a href="https://firecrawl.dev"> <img src="https://img.shields.io/badge/Visit-firecrawl.dev-orange" alt="Visit firecrawl.dev"> </a> </div> <div> <p align="center"> <a href="https://twitter.com/firecrawl"> <img src="https://img.shields.io/badge/Follow%20on%20X-000000?style=for-the-badge&logo=x&logoColor=white" alt="Follow on X" /> </a> <a href="https://www.linkedin.com/company/104100957"> <img src="https://img.shields.io/badge/Follow%20on%20LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="Follow on LinkedIn" /> </a> <a href="https://discord.gg/firecrawl"> <img src="https://img.shields.io/badge/Join%20our%20Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Join our Discord" /> </a> </p> </div> --- # **🔥 Firecrawl** **The API to search, scrape, and interact with the web at scale. 🔥** The web context API to find sources, extract content, and turn it into clean Markdown or structured data your agents can ship with. Open source and available as a [hosted service](https://firecrawl.dev/?ref=github). _Pst. Hey, you, join our stargazers :)_ <a href="https://github.com/firecrawl/firecrawl"> <img src="https://img.shields.io/github/stars/firecrawl/firecrawl.svg?style=social&label=Star&maxAge=2592000" alt="GitHub stars"> </a> --- ## Why Firecrawl? - **Industry-leading reliability**: Covers 96% of the web, including JS-heavy pages — no proxy headaches, just clean data ([see benchmarks](https://www.firecrawl.dev/blog/the-worlds-best-web-data-api-v25)) - **Blazingly fast**: P95 latency of 3.4s across millions of pages, built for real-time agents and dynamic apps - **LLM-ready output**: Clean markdown, structured JSON, screenshots, and more — spend fewer tokens, build better AI apps - **We handle the hard stuff**: Rotating proxies, orchestration, rate limits, JS-blocked content, and more — zero configuration - **Agent ready**: Connect Firecrawl to any AI agent or MCP client with a single command - **Media parsing**: Parse and extract content from web-hosted PDFs, DOCX, and more - **Actions**: Click, scroll, write, wait, and press before extracting content - **Open source**: Developed transparently and collaboratively — [join our community](https://github.com/firecrawl/firecrawl) --- ## Feature Overview **Core Endpoints** | Feature | Description | |---------|-------------| | [**Search**](#search) | Search the web and get full page content from results | | [**Scrape**](#scrape) | Convert any URL to markdown, HTML, screenshots, or structured JSON | | [**Interact**](#interact) | Scrape a page, then interact with it using AI prompts or code | **More** | Feature | Description | |---------|-------------| | [**Agent**](#agent) | Automated data gathering, just describe what you need | | [**Crawl**](#crawl) | Scrape all URLs of a website with a single request | | [**Map**](#map) | Discover all URLs on a website instantly | | [**Batch Scrape**](#batch-scrape) | Scrape thousands of URLs asynchronously | --- ## Quick Start Sign up at [firecrawl.dev](https://firecrawl.dev) to get your API key. Try the [playground](https://firecrawl.dev/playground) to test it out. ### Search Search the web and get full content from results. ```python from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") search_result = app.search("firecrawl", limit=5) ``` <details> <summary><b>Node.js / cURL / CLI</b></summary> **Node.js** ```javascript import { Firecrawl } from 'firecrawl'; const app = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); app.search("firecrawl", { limit: 5 }) ``` **cURL** ```bash curl -X POST 'https://api.firecrawl.dev/v2/search' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "query": "firecrawl", "limit": 5 }' ``` **CLI** ```bash firecrawl search "firecrawl" --limit 5 ``` </details> Output: ```json [ { "url": "https://firecrawl.dev", "title": "Firecrawl", "markdown": "Turn websites into..." }, { "url": "https://docs.firecrawl.dev", "title": "Firecrawl Docs", "markdown": "# Getting Started..." } ] ``` ### Scrape Get LLM-ready data from any website — markdown, JSON, screenshots, and more. ```python from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") result = app.scrape('firecrawl.dev') ``` <details> <summary><b>Node.js / cURL / CLI</b></summary> **Node.js** ```javascript import { Firecrawl } from 'firecrawl'; const app = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" }); app.scrape('firecrawl.dev') ``` **cURL** ```bash curl -X POST 'https://api.firecrawl.dev/v2/scrape' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "url": "firecrawl.dev" }' ``` **CLI** ```bash firecrawl scrape https://firecrawl.dev firecrawl https://firecrawl.dev --only-main-content ``` </details> Output: ``` # Firecrawl Firecrawl helps AI systems search, scrape, and interact with the web. ## Features - Search: Find information across the web - Scrape: Clean data from any page - Interact: Click, navigate, and operate pages - Agent: Autonomous data gathering ``` ### Interact Scrape a page, then interact with it using AI prompts or code. ```python from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") result = app.scrape("https://amazon.com") scrape_id = result.metadata.scrape_id app.interact(scrape_id, prompt="Search for 'mechanical keyboard'") app.interact(scrape_id, prompt="Click the first result") ``` <details> <summary><b>Node.js / cURL / CLI</b></summary> **Node.js** ```javascript import { Firecrawl } from 'firecrawl'; const app = new Firecrawl({apiKey: "fc-YOUR_API_KEY"}); const result = await app.scrape("https://amazon.com"); await app.interact(result.metadata.scrapeId, { prompt: "Search for 'mechanical keyboard'" }); await app.interact(result.metadata.scrapeId, { prompt: "Click the first result" }); ``` **cURL** ```bash # 1. Scrape the page curl -X POST 'https://api.firecrawl.dev/v2/scrape' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{"url": "https://amazon.com"}' # 2. Interact with the page (use scrapeId from step 1) curl -X POST 'https://api.firecrawl.dev/v2/scrape/SCRAPE_ID/interact' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{"prompt": "Search for mechanical keyboard"}' ``` **CLI** ```bash firecrawl scrape https://amazon.com firecrawl interact exec --prompt "Search for 'mechanical keyboard'" firecrawl interact exec --prompt "Click the first result" ``` </details> Output: ```json { "success": true, "output": "Keyboard available at $100", "liveViewUrl": "https://liveview.firecrawl.dev/..." } ``` --- ## Power Your Agent Connect Firecrawl to any AI agent or MCP client in minutes. ### Skill Give your agent easy access to real-time web data with one command. ```bash npx -y firecrawl-cli@latest init --all --browser ``` Restart your agent after installing. Works with [Claude Code](https://claude.ai/code), [Antigravity](https://antigravity.google), [OpenCode](https://opencode.ai), and more. ### MCP Connect any MCP-compatible client to the web in seconds. ```json { "mcpServers": { "firecrawl-mcp": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "fc-YOUR_API_KEY" } } } } ``` ### Agent Onboarding Are you an AI agent? Fetch this skill to sign up your user, get an API key, and start building with Firecrawl. ```bash curl -s https://firecrawl.dev/agent-onboarding/SKILL.md ``` See the [Skill + CLI documentation](https://docs.firecrawl.dev/sdks/cli) for all available commands. For MCP, see [firecrawl-mcp-server](https://github.com/firecrawl/firecrawl-mcp-server). --- ## More Endpoints ### Agent **The easiest way to get data from the web.** Describe what you need, and our AI agent searches, navigates, and retrieves it. No URLs required. Agent is the evolution of our `/extract` endpoint: faster, more reliable, and doesn't require you to know the URLs upfront. ```bash curl -X POST 'https://api.firecrawl.dev/v2/agent' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "prompt": "Find the pricing plans for Notion" }' ``` Response: ```json { "success": true, "data": { "result": "Notion offers the following pricing plans:\n\n1. Free - $0/month...\n2. Plus - $10/seat/month...\n3. Business - $18/seat/month...", "sources": ["https://www.notion.so/pricing"] } } ``` #### Agent with Structured Output Use a schema to get structured data: ```python from firecrawl import Firecrawl from pydantic import BaseModel, Field from typing import List, Optional app = Firecrawl(api_key="fc-YOUR_API_KEY") class Founder(BaseModel): name: str = Field(description="Full name of the founder") role: Optional[str] = Field(None, description="Role or position") class FoundersSchema(BaseModel): founders: List[Founder] = Field(description="List of founders") result = app.agent( prompt="Find the founders of Firecrawl", schema=FoundersSchema ) print(result.data) ``` ```json { "founders": [ {"name": "Eric Ciarla", "role": "Co-founder"}, {"name": "Nicolas Camara", "role": "Co-founder"}, {"name": "Caleb Peffer", "role": "Co-founder"} ] } ``` #### Agent with URLs (Optional) Focus the agent on specific pages: ```python result = app.agent( urls=["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], prompt="Compare the features and pricing information" ) ``` #### Model Selection Choose between two models based on your needs: | Model | Cost | Best For | |-------|------|----------| | `spark-1-mini` (default) | 60% cheaper | Most tasks | | `spark-1-pro` | Standard | Complex research, critical data gathering | ```python result = app.agent( prompt="Compare enterprise features across Firecrawl, Apify, and ScrapingBee", model="spark-1-pro" ) ``` **When to use Pro:** - Comparing data across multiple websites - Extracting from sites with complex navigation or auth - Research tasks where the agent needs to explore multiple paths - Critical data where accuracy is paramount Learn more about Spark models in our [Agent documentation](https://docs.firecrawl.dev/features/agent). ### Crawl Crawl an entire website and get content from all pages. ```bash curl -X POST 'https://api.firecrawl.dev/v2/crawl' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{ "url": "https://docs.firecrawl.dev", "limit": 100, "scrapeOptions": { "formats": ["markdown"] } }' ``` Returns a job ID: ```json { "success": true, "id": "123-456-789", "url": "https://api.firecrawl.dev/v2/crawl/123-456-789" } ``` #### Check Crawl Status ```bash curl -X GET 'https://api.firecrawl.dev/v2/crawl/123-456-789' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' ``` ```json { "status": "completed", "total": 50, "completed": 50, "creditsUsed": 50, "data": [ { "markdown": "# Page Title\n\nContent...", "metadata": {"title": "Page Title", "sourceURL": "https://..."} } ] } ``` **Note:** The [SDKs](#sdks) handle polling automatically for a better developer experience. ### Map Discover all URLs on a website instantly. ```bash curl -X POST 'https://api.firecrawl.dev/v2/map' \ -H 'Authorization: Bearer fc-YOUR_API_KEY' \ -H 'Content-Type: application/json' \ -d '{"url": "https://firecrawl.dev"}' ``` Response: ```json { "success": true, "links": [ {"url": "https://firecrawl.dev", "title": "Firecrawl", "description": "Turn websites into LLM-ready data"}, {"url": "https://firecrawl.dev/pricing", "title": "Pricing", "description": "Firecrawl pricing plans"}, {"url": "https://firecrawl.dev/blog", "title": "Blog", "description": "Firecrawl blog"} ] } ``` #### Map with Search Find specific URLs within a site: ```python from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") result = app.map("https://firecrawl.dev", search="pricing") # Returns URLs ordered by relevance to "pricing" ``` ### Batch Scrape Scrape multiple URLs at once: ```python from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") job = app.batch_scrape([ "https://firecrawl.dev", "https://docs.firecrawl.dev", "https://firecrawl.dev/pricing" ], formats=["markdown"]) for doc in job.data: print(doc.metadata.source_url) ``` --- ## SDKs Our SDKs provide a convenient way to use all Firecrawl features and automatically handle polling for async operations. ### Python Install the SDK: ```bash pip install firecrawl-py ``` ```python from firecrawl import Firecrawl app = Firecrawl(api_key="fc-YOUR_API_KEY") # Scrape a single URL doc = app.scrape("https://firecrawl.dev", formats=["markdown"]) print(doc.markdown) # Use the Agent for autonomous data gathering result = app.agent(prompt="Find the founders of Stripe") print(result.data) # Crawl a website (automatically waits for completion) docs = app.crawl("https://docs.firecrawl.dev", limit=50) for doc in docs.data: print(doc.metadata.source_url, doc.markdown[:100]) # Search the web results = app.search("best AI data tools 2024", limit=10) print(results) ``` ### Node.js Install the SDK: ```bash npm install firecrawl ``` ```javascript import { Firecrawl } from 'firecrawl'; const app = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' }); // Scrape a single URL const doc = await app.scrape('https://firecrawl.dev', { formats: ['markdown'] }); console.log(doc.markdown); // Use the Agent for autonomous data gathering const result = await app.agent({ prompt: 'Find the founders of Stripe' }); console.log(result.data); // Crawl a website (automatically waits for completion) const docs = await app.crawl('https://docs.firecrawl.dev', { limit: 50 }); docs.data.forEach(doc => { console.log(doc.metadata.sourceURL, doc.markdown.substring(0, 100)); }); // Search the web const results = await app.search('best AI data tools 2024', { limit: 10 }); results.data.web.forEach(result => { console.log(`${result.title}: ${result.url}`); }); ``` ### Java Add the dependency ([Gradle/Maven](https://docs.firecrawl.dev/sdks/java#installation)): ```groovy repositories { mavenCentral() maven { url 'https://jitpack.io' } } dependencies { implementation 'com.github.firecrawl:firecrawl-java-sdk:2.0' } ``` ```java import dev.firecrawl.client.FirecrawlClient; import dev.firecrawl.model.*; FirecrawlClient client = new FirecrawlClient( System.getenv("FIRECRAWL_API_KEY"), null, null ); // Scrape a single URL ScrapeParams scrapeParams = new ScrapeParams(); scrapeParams.setFormats(new String[]{"markdown"}); FirecrawlDocument doc = client.scrapeURL("https://firecrawl.dev", scrapeParams); System.out.println(doc.getMarkdown()); // Use the Agent for autonomous data gathering AgentParams agentParams = new AgentParams("Find the founders of Stripe"); AgentResponse start = client.createAgent(agentParams); AgentStatusResponse result = client.getAgentStatus(start.getId()); System.out.println(result.getData()); // Crawl a website (polls until completion) CrawlParams crawlParams = new CrawlParams(); crawlParams.setLimit(50); CrawlStatusResponse job = client.crawlURL("https://docs.firecrawl.dev", crawlParams, null, 10); for (FirecrawlDocument page : job.getData()) { System.out.println(page.getMetadata().get("sourceURL")); } // Search the web SearchParams searchParams = new SearchParams("best AI data tools 2024"); searchParams.setLimit(10); SearchResponse results = client.search(searchParams); for (SearchResult r : results.getResults()) { System.out.println(r.getTitle() + ": " + r.getUrl()); } ``` ### Elixir Add the dependency: ```elixir def deps do [ {:firecrawl, "~> 1.0"} ] end ``` ```elixir # Scrape a URL {:ok, response} = Firecrawl.scrape_and_extract_from_url( url: "https://firecrawl.dev", formats: ["markdown"] ) # Crawl a website {:ok, response} = Firecrawl.crawl_urls( url: "https://docs.firecrawl.dev", limit: 50 ) # Search the web {:ok, response} = Firecrawl.search_and_scrape( query: "best AI data tools 2024", limit: 10 ) # Map URLs {:ok, response} = Firecrawl.map_urls(url: "https://example.com") ``` ### Rust Add the dependency: ```toml [dependencies] firecrawl = "2" tokio = { version = "1", features = ["macros", "rt-multi-thread"] } ``` ```rust use firecrawl::{Client, ScrapeOptions, Format, CrawlOptions}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let client = Client::new("fc-YOUR_API_KEY")?; // Scrape a URL let document = client.scrape("https://firecrawl.dev", None).await?; println!("{:?}", document.markdown); // Crawl a website let options = CrawlOptions { limit: Some(50), ..Default::default() }; let result = client.crawl("https://docs.firecrawl.dev", options).await?; println!("Crawled {} pages", result.data.len()); // Search the web let response = client.search("best web scraping tools 2024", None).await?; println!("{:?}", response.data); Ok(()) } ``` ### Community SDKs - [Go SDK](https://github.com/firecrawl/firecrawl/tree/main/apps/go-sdk) --- ## Integrations **Agents & AI Tools** - [Firecrawl Skill](https://docs.firecrawl.dev/sdks/cli) - [Firecrawl CLI Skills](https://github.com/firecrawl/cli#agent-skills) - [Firecrawl Workflows](https://github.com/firecrawl/firecrawl-workflows) - [Firecrawl MCP](https://github.com/mendableai/firecrawl-mcp-server) **Platforms** - [Lovable](https://docs.lovable.dev/integrations/firecrawl) - [Zapier](https://zapier.com/apps/firecrawl/integrations) - [n8n](https://n8n.io/integrations/firecrawl/) [View all integrations →](https://www.firecrawl.dev/integrations) **Missing your favorite tool?** [Open an issue](https://github.com/mendableai/firecrawl/issues) and let us know! --- ## Resources - [Documentation](https://docs.firecrawl.dev) - [API Reference](https://docs.firecrawl.dev/api-reference/introduction) - [Playground](https://firecrawl.dev/playground) - [Changelog](https://firecrawl.dev/changelog) --- ## Open Source vs Cloud Firecrawl is open source under the AGPL-3.0 license. The cloud version at [firecrawl.dev](https://firecrawl.dev) includes additional features: ![Open Source vs Cloud](https://raw.githubusercontent.com/firecrawl/firecrawl/main/img/open-source-cloud.png) To run locally, see the [Contributing Guide](https://github.com/firecrawl/firecrawl/blob/main/CONTRIBUTING.md). To self-host, see [Self-Hosting Guide](https://docs.firecrawl.dev/contributing/self-host). --- ## Contributing We love contributions! Please read our [Contributing Guide](https://github.com/firecrawl/firecrawl/blob/main/CONTRIBUTING.md) before submitting a pull request. ### Contributors <a href="https://github.com/firecrawl/firecrawl/graphs/contributors"> <img alt="contributors" src="https://contrib.rocks/image?repo=firecrawl/firecrawl"/> </a> --- ## License This project is primarily licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). The SDKs and some UI components are licensed under the MIT License. See the LICENSE files in specific directories for details. --- **It is the sole responsibility of end users to respect websites' policies when scraping.** Users are advised to adhere to applicable privacy policies and terms of use. By default, Firecrawl respects robots.txt directives. By using Firecrawl, you agree to comply with these conditions. <p align="right" style="font-size: 14px; color: #555; margin-top: 20px;"> <a href="#readme-top" style="text-decoration: none; color: #007bff; font-weight: bold;"> ↑ Back to Top ↑ </a> </p>

AI & Machine Learning Browser Automation

130.5K Github Stars

Open Source

firesearch

# Firesearch - AI-Powered Deep Research Tool <div align="center"> <img src="https://media4.giphy.com/media/v1.Y2lkPTc5MGI3NjExd2F2YWo4amdieGVnOXR3aGM5ZnBlcDZvbnRjNW1vNmtpeWNhc3VtbSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/Jw7Q08ll8Vh0BoApI8/giphy.gif" alt="Firesearch Demo" width="100%" /> </div> Comprehensive web research powered by [Firecrawl](https://www.firecrawl.dev/) and [LangGraph](https://www.langchain.com/langgraph) ## Technologies - **Firecrawl**: Multi-source web content extraction - **OpenAI GPT-4o**: Search planning and follow-up generation - **Next.js 15**: Modern React framework with App Router [![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fmendableai%2Ffiresearch&env=FIRECRAWL_API_KEY,OPENAI_API_KEY&envDescription=API%20keys%20required%20for%20Firesearch&envLink=https%3A%2F%2Fgithub.com%2Fmendableai%2Ffiresearch%23required-api-keys) ## Setup ### Required API Keys | Service | Purpose | Get Key | |---------|---------|---------| | Firecrawl | Web scraping and content extraction | [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) | | OpenAI | Search planning and summarization | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) | ### Quick Start 1. Clone this repository 2. Create a `.env.local` file with your API keys: ``` FIRECRAWL_API_KEY=your_firecrawl_key OPENAI_API_KEY=your_openai_key ``` 3. Install dependencies: `npm install` or `yarn install` 4. Run the development server: `npm run dev` or `yarn dev` ## How It Works ### Architecture Overview ```mermaid flowchart TB Query["'Compare Samsung Galaxy S25<br/>and iPhone 16'"]:::query Query --> Break Break["🔍 Break into Sub-Questions"]:::primary subgraph SubQ["🌐 Search Queries"] S1["iPhone 16 Pro specs features"]:::search S2["Samsung Galaxy S25 Ultra specs"]:::search S3["iPhone 16 vs Galaxy S25 comparison"]:::search end Break --> SubQ subgraph FC["🔥 Firecrawl API Calls"] FC1["Firecrawl /search API<br/>Query 1"]:::firecrawl FC2["Firecrawl /search API<br/>Query 2"]:::firecrawl FC3["Firecrawl /search API<br/>Query 3"]:::firecrawl end S1 --> FC1 S2 --> FC2 S3 --> FC3 subgraph Sources["📄 Sources Found"] R1["Apple.com ✓<br/>The Verge ✓<br/>CNET ✓"]:::source R2["GSMArena ✓<br/>TechRadar ✓<br/>Samsung.com ✓"]:::source R3["AndroidAuth ✓<br/>TomsGuide ✓"]:::source end FC1 --> R1 FC2 --> R2 FC3 --> R3 subgraph Valid["✅ Answer Validation"] V1["iPhone 16 specs ✓ (0.95)"]:::good V2["S25 specs ✓ (0.9)"]:::good V3["S25 price ❌ (0.3)"]:::bad end Sources --> Valid Valid --> Retry Retry{"Need info:<br/>S25 pricing?"}:::check subgraph Strat["🧠 Alternative Strategy"] Original["Original: 'Galaxy S25 price'<br/>❌ No specific pricing found"]:::bad NewTerms["Try: 'Galaxy S25 MSRP cost'<br/>'Samsung S25 pricing leak'<br/>'S25 vs S24 price comparison'"]:::strategy end Retry -->|Yes| Strat subgraph Retry2["🔄 Retry Searches"] Alt1["Galaxy S25 MSRP retail"]:::search Alt2["Samsung S25 pricing leak"]:::search Alt3["S25 vs S24 price comparison"]:::search end Strat --> Retry2 subgraph FC2G["🔥 Retry API Calls"] FC4["Firecrawl /search API<br/>Alt Query 1"]:::firecrawl FC5["Firecrawl /search API<br/>Alt Query 2"]:::firecrawl FC6["Firecrawl /search API<br/>Alt Query 3"]:::firecrawl end Alt1 --> FC4 Alt2 --> FC5 Alt3 --> FC6 Results2["SamMobile ✓ ($899 leak)<br/>9to5Google ✓ ($100 more)<br/>PhoneArena ✓ ($899)"]:::source FC4 --> Results2 FC5 --> Results2 FC6 --> Results2 Final["All answers found ✓<br/>S25 price: $899"]:::good Results2 --> Final Synthesis["LLM synthesizes response"]:::synthesis Final --> Synthesis FollowUp["Generate follow-up questions"]:::primary Synthesis --> FollowUp Citations["List citations [1-10]"]:::primary FollowUp --> Citations Answer["Complete response delivered"]:::answer Citations --> Answer %% No path - skip retry and go straight to synthesis Retry -->|No| Synthesis classDef query fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff classDef subq fill:#ffd4b3,stroke:#ff6b1a,stroke-width:1px,color:#333 classDef search fill:#ff8c42,stroke:#ff6b1a,stroke-width:2px,color:#fff classDef source fill:#3a4a5c,stroke:#2c3a47,stroke-width:2px,color:#fff classDef check fill:#ffeb3b,stroke:#fbc02d,stroke-width:2px,color:#333 classDef good fill:#4caf50,stroke:#388e3c,stroke-width:2px,color:#fff classDef bad fill:#f44336,stroke:#d32f2f,stroke-width:2px,color:#fff classDef strategy fill:#9c27b0,stroke:#7b1fa2,stroke-width:2px,color:#fff classDef synthesis fill:#ff8c42,stroke:#ff6b1a,stroke-width:3px,color:#fff classDef answer fill:#3a4a5c,stroke:#2c3a47,stroke-width:3px,color:#fff classDef firecrawl fill:#ff6b1a,stroke:#ff4500,stroke-width:3px,color:#fff classDef label fill:none,stroke:none,color:#666,font-weight:bold ``` ### Process Flow 1. **Break Down** - Complex queries split into focused sub-questions 2. **Search** - Multiple searches via Firecrawl API for comprehensive coverage 3. **Extract** - Markdown content extracted from web sources 4. **Validate** - Check if sources actually answer the questions (0.7+ confidence) 5. **Retry** - Alternative search terms for unanswered questions (max 2 attempts) 6. **Synthesize** - GPT-4o combines findings into cited answer ### Key Features - **Smart Search** - Breaks complex queries into multiple focused searches - **Answer Validation** - Verifies sources contain actual answers (0.7+ confidence) - **Auto-Retry** - Alternative search terms for unanswered questions - **Real-time Progress** - Live updates as searches complete - **Full Citations** - Every fact linked to its source - **Context Memory** - Follow-up questions maintain conversation context ### Configuration Customize search behavior by modifying [`lib/config.ts`](lib/config.ts): ```typescript export const SEARCH_CONFIG = { // Search Settings MAX_SEARCH_QUERIES: 12, // Maximum number of search queries to generate MAX_SOURCES_PER_SEARCH: 4, // Maximum sources to return per search query MAX_SOURCES_TO_SCRAPE: 3, // Maximum sources to scrape for additional content // Content Processing MIN_CONTENT_LENGTH: 100, // Minimum content length to consider valid SUMMARY_CHAR_LIMIT: 100, // Character limit for source summaries // Retry Logic MAX_RETRIES: 2, // Maximum retry attempts for failed operations MAX_SEARCH_ATTEMPTS: 2, // Maximum attempts to find answers via search MIN_ANSWER_CONFIDENCE: 0.7, // Minimum confidence (0-1) that a question was answered // Timeouts SCRAPE_TIMEOUT: 15000, // Timeout for scraping operations (ms) } as const; ``` ### Firecrawl API Integration Firesearch leverages Firecrawl's powerful `/search` endpoint: #### `/search` - Web Search with Content - **Purpose**: Finds relevant URLs AND extracts markdown content in one call - **Usage**: Each decomposed query is sent to find 6-8 relevant sources with content - **Response**: Returns URLs with titles, snippets, AND full markdown content - **Key Feature**: The `scrapeOptions` parameter enables content extraction during search - **Example**: ``` POST /search { "query": "iPhone 16 specs pricing", "limit": 8, "scrapeOptions": { "formats": ["markdown"] } } ``` ### Search Strategies When initial results are insufficient, the system automatically tries: - **Broaden Keywords**: Removes specific terms for wider results - **Narrow Focus**: Adds specific terms to target missing aspects - **Synonyms**: Uses alternative terms and phrases - **Rephrase**: Completely reformulates the query - **Decompose**: Breaks complex queries into sub-questions - **Academic**: Adds scholarly terms for research-oriented results - **Practical**: Focuses on tutorials and how-to guides ## Example Queries - "Who are the founders of Firecrawl?" - "When did NVIDIA release the RTX 4080 Super?" - "Compare the latest iPhone, Samsung Galaxy, and Google Pixel flagship features" ## License MIT License

AI Agents

484 Github Stars

Open Source

rag-arena

# RAG Arena RAG Arena is an open-source Next.js project made my mendable.ai that interfaces with LangChain to provide a RAG chatbot experience where queries receive multiple responses. Users vote on these responses, which are then unblurred to reveal the Retriever used, differentiating the chatbots by their data RAG methods. The project utilizes Supabase for database operations and features a real-time leaderboard displaying data from the database. ## Installation Ensure you have `pnpm` installed on your system. If not, install it via: ```bash npm install -g pnpm ``` Clone the project repository: ```bash git clone https://github.com/mendableai/rag-arena ``` Navigate to the project directory and install the dependencies: ```bash cd RAG-arena pnpm i ``` Configure your environment variables: ``` # probably in: https://platform.openai.com/api-keys OPENAI_API_KEY= # probably in: https://supabase.com/dashboard/ project>project settings>api NEXT_PUBLIC_SUPABASE_URL= NEXT_PUBLIC_SUPABASE_PRIVATE_KEY= # probably in: https://console.upstash.com/redis/ UPSTASH_REDIS_REST_URL= UPSTASH_REDIS_REST_TOKEN= PRODUCTION=false PYTHON_MICRO_SERVER= ``` Start the development (nextjs) web server: ```bash pnpm dev ``` ## Running the python server ```bash cd python_service ``` ```bash poetry install ``` *(if you don't have poetry just add id using pip install poetry) ### Activating the Neo4j Graph Store For the Graph Rag retriever you'll need to have the graph store built, or let the server automatically run the 'create_neo4j_graph_store' function (localized in `/python_service/retrievers/neo4j_retriever.py`) by uncommenting the lines: ``` # if not os.path.exists(storage_dir) or not os.listdir(storage_dir): # create_neo4j_graph_store() ``` This will take a while depending on the data used in `data/chunks`. At the end you will have your `neo/storage` directory populated with persisted data for the graph store locally. ### Loading index for the first time You will need the index loaded and cached so that the Graph RAG can be used. The `load_index()` function does that for you inside `python_service/app.py`. So in your very first execution it may take a while to create the cached .pkl file placed in `python_service/index/cache`. ### Run the flask server locally with debug mode ```bash poetry run flask run --debug ``` Open http://localhost:3000 with your browser to see the result. # Architecture Overview ### Ingestion System - **Path:** `app/api/ingest/route.ts` - **Description:** This module is responsible for ingesting articles into a vector database, making them retrievable for future queries. - **Implementation Details:** - **Splitter:** Utilizes the `RecursiveCharacterTextSplitter` from LangChain for effective text splitting. - **Embeddings:** Leverages `OpenAIEmbeddings` for generating document embeddings. - **Storage:** Employs `SupabaseVectorStore` for storing the processed documents in Supabase. ### Dynamic Retriever - **Path:** `app/api/retrievers/dynamic-retriever/route.ts` - **Description:** Dynamically selects and uses different retrievers based on user input to fetch relevant documents. - **Key Features:** - **Rate Limiting:** Implements rate limiting to manage the load and ensure fair usage. - **Document RAG:** Retrieves documents that are most relevant to the user's query. - **OpenAI Integration:** Interacts with OpenAI's API to enhance chat completions, powered by SupabaseVectorStore for document matching. ### Voting System - **Path:** `actions/voting-system.ts` - **Purpose:** Oversees the logic behind voting for the retrieved answers, which is crucial for the learning and adaptation of the system. - **Functionalities:** - **Vote Tracking:** Updates the number of votes, times tested, and the Elo ratings for each retriever. - **Elo Adjustment:** Adjusts the Elo ratings based on the average number of times retrievers have been tested, promoting fairness and accuracy. ``` // calculation used for the elo function calculateEloAdjustment(timesTested: number, averageTimesTested: number): number { if (averageTimesTested === 0) return 10; const adjustmentFactor = timesTested / averageTimesTested; return (1 / adjustmentFactor) * 10; } ``` ### Database Schema - **Table Name:** Leaderboard - **Contents:** Holds crucial data for each retriever, including `id`, `retriever`, `elo`, `votes`, `times_tested`, `full_name`, `description`, and `link`. ## RAG Functions Overview (https://js.langchain.com/docs/modules/data_connection/retrievers/) This section outlines the various RAG functions defined in `app/api/retrievers/dynamic-retriever/tools/functions.ts`, detailing their purpose and implementation within the project's architecture. These functions play a crucial role in the document RAG process, leveraging different strategies and technologies to optimize performance and accuracy. ### Vector Store - **When to Use:** Ideal for beginners seeking a quick and straightforward solution. - **Description:** This function leverages the simplicity of creating embeddings for each text piece, making it the most accessible starting point for document RAG. ### Parent Document - **When to Use:** Best for documents divided into smaller, distinct chunks of information that are indexed separately but should be retrieved as a whole. - **Description:** It indexes multiple chunks per document, identifying the most similar chunks in embedding space to retrieve the entire parent document, rather than just the individual chunks. ### Multi Vector - **When to Use:** Suitable when you can extract more relevant information for indexing than the document's text itself. - **Description:** Creates multiple vectors for each document, with each vector potentially representing text summaries, hypothetical questions, or other forms of distilled information. ### Contextual Compression - **When to Use:** Useful when retrieved documents contain excessive irrelevant information, distracting from the core query response. - **Description:** Adds a post-processing step to another retriever, extracting only the most pertinent information from retrieved documents, which can be accomplished using embeddings or an LLM. ### Time Weighted - **When to Use:** Ideal for documents with associated timestamps, aiming to retrieve the most recent documents based on semantic similarity and recency. - **Description:** Retrieves documents by balancing semantic similarity with document timestamps, ensuring recent documents are prioritized in the RAG process. ### Multi-Query Retriever - **When to Use:** Best for complex queries requiring multiple distinct pieces of information for a comprehensive response. - **Description:** Generates multiple queries from a single input, addressing the need for information across various topics to answer the original query fully. This approach fetches documents for each generated query, ensuring a thorough response. ## Contributing Contributions are welcome! Please follow the standard fork & pull request workflow. Ensure you adhere to the coding styles and patterns present in the project and write tests for new features or bug fixes. ## License RAG Arena is open source and released under the MIT License. See the LICENSE file for more information.

Knowledge Bases & RAG Testing & QA

220 Github Stars

Open Source

firecrawl-app-examples

# Firecrawl App Examples Repository This repository contains example applications developed using Firecrawl. These examples demonstrate various implementations and use cases for Firecrawl. ## Getting Started To explore these examples: 1. Clone this repository to your local machine. 2. Navigate to the specific example directory you're interested in. 3. Follow the README instructions within each project directory for setup and running the application.

Knowledge Bases & RAG Browser Automation Code Editors & IDEs

750 Github Stars

Open Source

firecrawl-mcp-server

<div align="center"> <a name="readme-top"></a> <img src="https://raw.githubusercontent.com/firecrawl/firecrawl-mcp-server/main/img/fire.png" height="140" > </div> # Firecrawl MCP Server A Model Context Protocol (MCP) server that brings [Firecrawl](https://github.com/firecrawl/firecrawl) to MCP-compatible AI agents — search, scrape, and interact with the live web for clean, agent-ready context. > Big thanks to [@vrknetha](https://github.com/vrknetha), [@knacklabs](https://www.knacklabs.ai) for the initial implementation! ## Features - Search the web and get full page content - Scrape any URL into clean, structured data - Interact with pages — click, navigate, and operate - Deep research with autonomous agent - Automatic retries and rate limiting - Cloud and self-hosted support - SSE support > Play around with [our MCP Server on MCP.so's playground](https://mcp.so/playground?server=firecrawl-mcp-server) or on [Klavis AI](https://www.klavis.ai/mcp-servers). ## Installation ### Running with npx ```bash env FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp ``` ### Manual Installation ```bash npm install -g firecrawl-mcp ``` ### Running on Cursor Configuring Cursor 🖥️ Note: Requires Cursor version 0.45.6+ For the most up-to-date configuration instructions, please refer to the official Cursor documentation on configuring MCP servers: [Cursor MCP Server Configuration Guide](https://docs.cursor.com/context/model-context-protocol#configuring-mcp-servers) To configure Firecrawl MCP in Cursor **v0.48.6** 1. Open Cursor Settings 2. Go to Features > MCP Servers 3. Click "+ Add new global MCP server" 4. Enter the following code: ```json { "mcpServers": { "firecrawl-mcp": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR-API-KEY" } } } } ``` To configure Firecrawl MCP in Cursor **v0.45.6** 1. Open Cursor Settings 2. Go to Features > MCP Servers 3. Click "+ Add New MCP Server" 4. Enter the following: - Name: "firecrawl-mcp" (or your preferred name) - Type: "command" - Command: `env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp` > If you are using Windows and are running into issues, try `cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"` Replace `your-api-key` with your Firecrawl API key. If you don't have one yet, you can create an account and get it from https://www.firecrawl.dev/app/api-keys After adding, refresh the MCP server list to see the new tools. The Composer Agent will automatically use Firecrawl MCP when appropriate, but you can explicitly request it by describing your web scraping needs. Access the Composer via Command+L (Mac), select "Agent" next to the submit button, and enter your query. ### Running on Windsurf Add this to your `./codeium/windsurf/model_config.json`: ```json { "mcpServers": { "mcp-server-firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY" } } } } ``` ### Running with Streamable HTTP Local Mode To run the server using Streamable HTTP locally instead of the default stdio transport: ```bash env HTTP_STREAMABLE_SERVER=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp ``` Use the url: http://localhost:3000/mcp ### Installing via Smithery (Legacy) To install Firecrawl for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@mendableai/mcp-server-firecrawl): ```bash npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude ``` ### Running on VS Code For one-click installation, click one of the install buttons below... [![Install with NPX in VS Code](https://img.shields.io/badge/VS_Code-NPM-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D) [![Install with NPX in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-NPM-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D&quality=insiders) For manual installation, add the following JSON block to your User Settings (JSON) file in VS Code. You can do this by pressing `Ctrl + Shift + P` and typing `Preferences: Open User Settings (JSON)`. ```json { "mcp": { "inputs": [ { "type": "promptString", "id": "apiKey", "description": "Firecrawl API Key", "password": true } ], "servers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "${input:apiKey}" } } } } } ``` Optionally, you can add it to a file called `.vscode/mcp.json` in your workspace. This will allow you to share the configuration with others: ```json { "inputs": [ { "type": "promptString", "id": "apiKey", "description": "Firecrawl API Key", "password": true } ], "servers": { "firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "${input:apiKey}" } } } } ``` ## Configuration ### Environment Variables #### Required for Cloud API - `FIRECRAWL_API_KEY`: Your Firecrawl API key - Required when using cloud API (default) - Optional when using self-hosted instance with `FIRECRAWL_API_URL` - `FIRECRAWL_API_URL` (Optional): Custom API endpoint for self-hosted instances - Example: `https://firecrawl.your-domain.com` - If not provided, the cloud API will be used (requires API key) #### MCP OAuth (Bearer access tokens) Hosted Firecrawl can issue OAuth **access tokens** (`fco_…`) via the authorization server on [firecrawl.dev](https://firecrawl.dev). This MCP server forwards whichever credential it resolves to the Firecrawl API as `Authorization: Bearer …`. - **HTTP stream transports** (`CLOUD_SERVICE=true`, `HTTP_STREAMABLE_SERVER=true`, or `SSE_LOCAL=true`): Clients should send `Authorization: Bearer <fco_access_token>` on MCP requests. An OAuth bearer token takes precedence over `x-firecrawl-api-key` / `x-api-key` when both are present. - **stdio:** Use `FIRECRAWL_OAUTH_TOKEN` for a static access token, or keep using `FIRECRAWL_API_KEY` for an API key. Use **access** tokens (`fco_…`) only. Refresh tokens (`fcr_…`) must be exchanged at the token endpoint, not passed to the scrape/search API. #### Optional Configuration ##### Retry Configuration - `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum number of retry attempts (default: 3) - `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds before first retry (default: 1000) - `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds between retries (default: 10000) - `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2) ##### Credit Usage Monitoring - `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Credit usage warning threshold (default: 1000) - `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Credit usage critical threshold (default: 100) ### Configuration Examples For cloud API usage with custom retry and credit monitoring: ```bash # Required for cloud API export FIRECRAWL_API_KEY=your-api-key # Optional retry configuration export FIRECRAWL_RETRY_MAX_ATTEMPTS=5 # Increase max retry attempts export FIRECRAWL_RETRY_INITIAL_DELAY=2000 # Start with 2s delay export FIRECRAWL_RETRY_MAX_DELAY=30000 # Maximum 30s delay export FIRECRAWL_RETRY_BACKOFF_FACTOR=3 # More aggressive backoff # Optional credit monitoring export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000 # Warning at 2000 credits export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500 # Critical at 500 credits ``` For self-hosted instance: ```bash # Required for self-hosted export FIRECRAWL_API_URL=https://firecrawl.your-domain.com # Optional authentication for self-hosted export FIRECRAWL_API_KEY=your-api-key # If your instance requires auth # Custom retry configuration export FIRECRAWL_RETRY_MAX_ATTEMPTS=10 export FIRECRAWL_RETRY_INITIAL_DELAY=500 # Start with faster retries ``` ### Usage with Claude Desktop Add this to your `claude_desktop_config.json`: ```json { "mcpServers": { "mcp-server-firecrawl": { "command": "npx", "args": ["-y", "firecrawl-mcp"], "env": { "FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE", "FIRECRAWL_RETRY_MAX_ATTEMPTS": "5", "FIRECRAWL_RETRY_INITIAL_DELAY": "2000", "FIRECRAWL_RETRY_MAX_DELAY": "30000", "FIRECRAWL_RETRY_BACKOFF_FACTOR": "3", "FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000", "FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500" } } } } ``` ### System Configuration The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured: ```typescript const CONFIG = { retry: { maxAttempts: 3, // Number of retry attempts for rate-limited requests initialDelay: 1000, // Initial delay before first retry (in milliseconds) maxDelay: 10000, // Maximum delay between retries (in milliseconds) backoffFactor: 2, // Multiplier for exponential backoff }, credit: { warningThreshold: 1000, // Warn when credit usage reaches this level criticalThreshold: 100, // Critical alert when credit usage reaches this level }, }; ``` These configurations control: 1. **Retry Behavior** - Automatically retries failed requests due to rate limits - Uses exponential backoff to avoid overwhelming the API - Example: With default settings, retries will be attempted at: - 1st retry: 1 second delay - 2nd retry: 2 seconds delay - 3rd retry: 4 seconds delay (capped at maxDelay) 2. **Credit Usage Monitoring** - Tracks API credit consumption for cloud API usage - Provides warnings at specified thresholds - Helps prevent unexpected service interruption - Example: With default settings: - Warning at 1000 credits remaining - Critical alert at 100 credits remaining ### Rate Limiting and Batch Processing The server utilizes Firecrawl's built-in rate limiting and batch processing capabilities: - Automatic rate limit handling with exponential backoff - Efficient parallel processing for batch operations - Smart request queuing and throttling - Automatic retries for transient errors ## How to Choose a Tool Use this guide to select the right tool for your task: - **If you know the exact URL(s) you want:** - For one: use **scrape** (with JSON format for structured data) - For many: use **batch_scrape** - **If you need to discover URLs on a site:** use **map** - **If you want to search the web for info:** use **search** - **If you need complex research across multiple unknown sources:** use **agent** - **If you want to analyze a whole site or section:** use **crawl** (with limits!) - **If you need interactive browser automation** (click, type, navigate): use **scrape** + **interact** ### Quick Reference Table | Tool | Best for | Returns | | ------------ | ---------------------------------------------- | ------------------------------ | | scrape | Single page content | JSON (preferred) or markdown | | interact | Interact with a scraped page | Execution result | | batch_scrape | Multiple known URLs | JSON (preferred) or markdown[] | | map | Discovering URLs on a site | URL[] | | crawl | Multi-page extraction (with limits) | markdown/html[] | | search | Web search for info | results[] | | agent | Complex multi-source research | JSON (structured data) | ### Format Selection Guide When using `scrape` or `batch_scrape`, choose the right format: - **JSON format (recommended for most cases):** Use when you need specific data from a page. Define a schema based on what you need to extract. This keeps responses small and avoids context window overflow. - **Markdown format (use sparingly):** Only when you genuinely need the full page content, such as reading an entire article for summarization or analyzing page structure. ## Available Tools ### 1. Scrape Tool (`firecrawl_scrape`) Scrape content from a single URL with advanced options. **Best for:** - Single page content extraction, when you know exactly which page contains the information. **Not recommended for:** - Extracting content from multiple pages (use batch_scrape for known URLs, or map + batch_scrape to discover URLs first, or crawl for full page content) - When you're unsure which page contains the information (use search) **Common mistakes:** - Using scrape for a list of URLs (use batch_scrape instead). - Using markdown format by default (use JSON format to extract only what you need). **Choosing the right format:** - **JSON format (preferred):** For most use cases, use JSON format with a schema to extract only the specific data needed. This keeps responses focused and prevents context window overflow. - **Markdown format:** Only when the task genuinely requires full page content (e.g., summarizing an entire article, analyzing page structure). **Prompt Example:** > "Get the product details from https://example.com/product." **Usage Example (JSON format - preferred):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/product", "formats": [ { "type": "json", "prompt": "Extract the product information", "schema": { "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" }, "description": { "type": "string" } }, "required": ["name", "price"] } } ] } } ``` **Usage Example (markdown format - when full content needed):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com/article", "formats": ["markdown"], "onlyMainContent": true } } ``` **Usage Example (branding format - extract brand identity):** ```json { "name": "firecrawl_scrape", "arguments": { "url": "https://example.com", "formats": ["branding"] } } ``` **Branding format:** Extracts comprehensive brand identity (colors, fonts, typography, spacing, logo, UI components) for design analysis or style replication. **Privacy:** Set `redactPII: true` to return content with personally identifiable information redacted. **Returns:** - JSON structured data, markdown, branding profile, or other formats as specified. ### 2. Batch Scrape Tool (`firecrawl_batch_scrape`) Scrape multiple URLs efficiently with built-in rate limiting and parallel processing. **Best for:** - Retrieving content from multiple pages, when you know exactly which pages to scrape. **Not recommended for:** - Discovering URLs (use map first if you don't know the URLs) - Scraping a single page (use scrape) **Common mistakes:** - Using batch_scrape with too many URLs at once (may hit rate limits or token overflow) **Prompt Example:** > "Get the content of these three blog posts: [url1, url2, url3]." **Usage Example:** ```json { "name": "firecrawl_batch_scrape", "arguments": { "urls": ["https://example1.com", "https://example2.com"], "options": { "formats": ["markdown"], "onlyMainContent": true } } } ``` **Returns:** - Response includes operation ID for status checking: ```json { "content": [ { "type": "text", "text": "Batch operation queued with ID: batch_1. Use firecrawl_check_batch_status to check progress." } ], "isError": false } ``` ### 3. Check Batch Status (`firecrawl_check_batch_status`) Check the status of a batch operation. ```json { "name": "firecrawl_check_batch_status", "arguments": { "id": "batch_1" } } ``` ### 4. Map Tool (`firecrawl_map`) Map a website to discover all indexed URLs on the site. **Best for:** - Discovering URLs on a website before deciding what to scrape - Finding specific sections of a website **Not recommended for:** - When you already know which specific URL you need (use scrape or batch_scrape) - When you need the content of the pages (use scrape after mapping) **Common mistakes:** - Using crawl to discover URLs instead of map **Prompt Example:** > "List all URLs on example.com." **Usage Example:** ```json { "name": "firecrawl_map", "arguments": { "url": "https://example.com" } } ``` **Returns:** - Array of URLs found on the site ### 5. Search Tool (`firecrawl_search`) Search the web and optionally extract content from search results. **Best for:** - Finding specific information across multiple websites, when you don't know which website has the information. - When you need the most relevant content for a query **Not recommended for:** - When you already know which website to scrape (use scrape) - When you need comprehensive coverage of a single website (use map or crawl) **Common mistakes:** - Using crawl or map for open-ended questions (use search instead) **Usage Example:** ```json { "name": "firecrawl_search", "arguments": { "query": "latest AI research papers 2023", "limit": 5, "lang": "en", "country": "us", "scrapeOptions": { "formats": ["markdown"], "onlyMainContent": true, "redactPII": true } } } ``` **Returns:** - Array of search results (with optional scraped content), plus an `id` field. Pass that `id` to `firecrawl_search_feedback` after you've used the results to refund 1 credit (search costs 2) and improve search quality. **Prompt Example:** > "Find the latest research papers on AI published in 2023." ### 5b. Search Feedback Tool (`firecrawl_search_feedback`) Sends structured feedback on a previous `firecrawl_search` result. The first feedback per search id refunds 1 credit and improves Firecrawl's search quality. Idempotent per search id. **Call this after every search you actually use** (or that didn't help). Bad/partial feedback with `missingContent` is just as valuable as good feedback. **Opt out:** set `FIRECRAWL_NO_SEARCH_FEEDBACK=1` (or `FIRECRAWL_DISABLE_SEARCH_FEEDBACK=1`) in the environment when starting the MCP server. The `firecrawl_search_feedback` tool will not be registered, so agents can't call it. Team admins can also disable feedback server-side; in that case the tool is registered but always returns `feedbackErrorCode: "TEAM_OPTED_OUT"`. **Most important field:** `missingContent`. It's an array of specific pieces of content the agent expected to find but did not. One entry per missing topic — these aggregate across teams and tell us what to index next. **Daily refund cap (per team, per UTC day, default 100 credits).** Once a team's `creditsRefundedToday` reaches `dailyRefundCap`, further submissions still record feedback but no longer refund credits. The response sets `dailyCapReached: true`. Agents should stop calling this tool for the rest of the UTC day when they see that flag. **Usage Example:** ```json { "name": "firecrawl_search_feedback", "arguments": { "searchId": "0193f6c5-1234-7890-abcd-1234567890ab", "rating": "good", "valuableSources": [ { "url": "https://docs.firecrawl.dev/features/search", "reason": "Most up-to-date description of /search." } ], "missingContent": [ { "topic": "Pricing for the search endpoint", "description": "No pricing tier table for /search specifically." }, { "topic": "Per-team rate limits" } ], "querySuggestions": "Boost docs.firecrawl.dev for queries that mention 'firecrawl'" } } ``` **Returns:** - `{ success, feedbackId, creditsRefunded, alreadySubmitted? }` JSON. ### 6. Crawl Tool (`firecrawl_crawl`) Starts an asynchronous crawl job on a website and extract content from all pages. **Best for:** - Extracting content from multiple related pages, when you need comprehensive coverage. **Not recommended for:** - Extracting content from a single page (use scrape) - When token limits are a concern (use map + batch_scrape) - When you need fast results (crawling can be slow) **Warning:** Crawl responses can be very large and may exceed token limits. Limit the crawl depth and number of pages, or use map + batch_scrape for better control. **Common mistakes:** - Setting limit or maxDepth too high (causes token overflow) - Using crawl for a single page (use scrape instead) **Prompt Example:** > "Get all blog posts from the first two levels of example.com/blog." **Usage Example:** ```json { "name": "firecrawl_crawl", "arguments": { "url": "https://example.com/blog/*", "maxDepth": 2, "limit": 100, "allowExternalLinks": false, "deduplicateSimilarURLs": true } } ``` **Returns:** - Response includes operation ID for status checking: ```json { "content": [ { "type": "text", "text": "Started crawl for: https://example.com/* with job ID: 550e8400-e29b-41d4-a716-446655440000. Use firecrawl_check_crawl_status to check progress." } ], "isError": false } ``` ### 7. Check Crawl Status (`firecrawl_check_crawl_status`) Check the status of a crawl job. ```json { "name": "firecrawl_check_crawl_status", "arguments": { "id": "550e8400-e29b-41d4-a716-446655440000" } } ``` **Returns:** - Response includes the status of the crawl job: ### 8. Extract Tool (`firecrawl_extract`) Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction. **Best for:** - Extracting specific structured data like prices, names, details. **Not recommended for:** - When you need the full content of a page (use scrape) - When you're not looking for specific structured data **Arguments:** - `urls`: Array of URLs to extract information from - `prompt`: Custom prompt for the LLM extraction - `systemPrompt`: System prompt to guide the LLM - `schema`: JSON schema for structured data extraction - `allowExternalLinks`: Allow extraction from external links - `enableWebSearch`: Enable web search for additional context - `includeSubdomains`: Include subdomains in extraction When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service. **Prompt Example:** > "Extract the product name, price, and description from these product pages." **Usage Example:** ```json { "name": "firecrawl_extract", "arguments": { "urls": ["https://example.com/page1", "https://example.com/page2"], "prompt": "Extract product information including name, price, and description", "systemPrompt": "You are a helpful assistant that extracts product information", "schema": { "type": "object", "properties": { "name": { "type": "string" }, "price": { "type": "number" }, "description": { "type": "string" } }, "required": ["name", "price"] }, "allowExternalLinks": false, "enableWebSearch": false, "includeSubdomains": false } } ``` **Returns:** - Extracted structured data as defined by your schema ```json { "content": [ { "type": "text", "text": { "name": "Example Product", "price": 99.99, "description": "This is an example product description" } } ], "isError": false } ``` ### 9. Agent Tool (`firecrawl_agent`) Autonomous web research agent. This is a separate AI agent layer that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query. **How it works:** The agent performs web searches, follows links, reads pages, and gathers data autonomously. This runs **asynchronously** - it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results. **Async workflow:** 1. Call `firecrawl_agent` with your prompt/schema → returns job ID 2. Do other work while the agent researches (can take minutes for complex queries) 3. Poll `firecrawl_agent_status` with the job ID to check progress 4. When status is "completed", the response includes the extracted data **Best for:** - Complex research tasks where you don't know the exact URLs - Multi-source data gathering - Finding information scattered across the web - Tasks where you can do other work while waiting for results **Not recommended for:** - Simple single-page scraping where you know the URL (use scrape with JSON format - faster and cheaper) **Arguments:** - `prompt`: Natural language description of the data you want (required, max 10,000 characters) - `urls`: Optional array of URLs to focus the agent on specific pages - `schema`: Optional JSON schema for structured output **Prompt Example:** > "Find the founders of Firecrawl and their backgrounds" **Usage Example (start agent, then poll for results):** ```json { "name": "firecrawl_agent", "arguments": { "prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts", "schema": { "type": "object", "properties": { "startups": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string" }, "funding": { "type": "string" }, "founded": { "type": "string" } } } } } } } } ``` Then poll with `firecrawl_agent_status` using the returned job ID. **Usage Example (with URLs - agent focuses on specific pages):** ```json { "name": "firecrawl_agent", "arguments": { "urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"], "prompt": "Compare the features and pricing information from these pages" } } ``` **Returns:** - Job ID for status checking. Use `firecrawl_agent_status` to poll for results. ### 10. Check Agent Status (`firecrawl_agent_status`) Check the status of an agent job and retrieve results when complete. Use this to poll for results after starting an agent. **Polling pattern:** Agent research can take minutes for complex queries. Poll this endpoint periodically (e.g., every 10-30 seconds) until status is "completed" or "failed". ```json { "name": "firecrawl_agent_status", "arguments": { "id": "550e8400-e29b-41d4-a716-446655440000" } } ``` **Possible statuses:** - `processing`: Agent is still researching - check back later - `completed`: Research finished - response includes the extracted data - `failed`: An error occurred ### 11. Monitor Tools (`firecrawl_monitor_*`) Create and manage recurring page monitors. Monitors run scheduled scrapes or crawls, diff each result against the last retained snapshot, and can notify by webhook or email. **Best for:** - Watching one page or a few pages over time - Alerting on meaningful changes using a plain-English goal - Tracking check history and page-level diffs **Recommended create pattern:** Use `page` or `pages` plus `goal`. The MCP server builds the monitor request with a 30-minute schedule and the API enables meaningful-change judging automatically. Meaningful-change judging runs automatically when `goal` is set. Page webhooks expose `isMeaningful` and `judgment` on `monitor.page` events. Write goals as concise 2-3 sentence monitor instructions. Say what should trigger an alert, preserve any scope the user gave, and include intent-specific exclusions only when obvious from the request. Generic noise such as whitespace, formatting-only changes, request IDs, tracking params, generic metadata, and unrelated page chrome is already handled by the judge, so do not repeat it in every goal. If the user is vague, keep the goal broad; if they ask for broad monitoring or "any change", preserve that. If the user says they do not care about something, include that explicitly. ```json { "name": "firecrawl_monitor_create", "arguments": { "page": "https://example.com/pricing", "goal": "Alert when pricing, packaging, or launch messaging changes." } } ``` **Multiple pages with webhooks:** ```json { "name": "firecrawl_monitor_create", "arguments": { "pages": ["https://example.com/pricing", "https://example.com/changelog"], "goal": "Alert when pricing, packaging, or launch messaging changes.", "webhookUrl": "https://example.com/webhooks/firecrawl" } } ``` **Advanced create requests:** Pass `body` when you need crawl targets, JSON change tracking, custom retention, or explicit `judgeEnabled` control. ```json { "name": "firecrawl_monitor_create", "arguments": { "body": { "name": "Docs monitor", "schedule": { "text": "hourly", "timezone": "UTC" }, "goal": "Alert when docs pages add, remove, or materially change API behavior.", "targets": [{ "type": "crawl", "url": "https://example.com/docs" }] } } } ``` **Other monitor tools:** - `firecrawl_monitor_list`: list monitors. - `firecrawl_monitor_get`: get one monitor. - `firecrawl_monitor_update`: update fields including `goal`, `judgeEnabled`, `webhook`, and `notification`. - `firecrawl_monitor_run`: trigger a check now. - `firecrawl_monitor_checks`: list checks, optionally filtered by status. - `firecrawl_monitor_check`: get page-level results, including `diff`, `snapshot`, `judgment.meaningful`, and `judgment.meaningfulChanges`. ## Logging System The server includes comprehensive logging: - Operation status and progress - Performance metrics - Credit usage monitoring - Rate limit tracking - Error conditions Example log messages: ``` [INFO] Firecrawl MCP Server initialized successfully [INFO] Starting scrape for URL: https://example.com [INFO] Batch operation queued with ID: batch_1 [WARNING] Credit usage has reached warning threshold [ERROR] Rate limit exceeded, retrying in 2s... ``` ## Error Handling The server provides robust error handling: - Automatic retries for transient errors - Rate limit handling with backoff - Detailed error messages - Credit usage warnings - Network resilience Example error response: ```json { "content": [ { "type": "text", "text": "Error: Rate limit exceeded. Retrying in 2 seconds..." } ], "isError": true } ``` ## Development ```bash # Install dependencies npm install # Build npm run build # Run tests npm test ``` ### Contributing 1. Fork the repository 2. Create your feature branch 3. Run tests: `npm test` 4. Submit a pull request ### Thanks to contributors Thanks to [@vrknetha](https://github.com/vrknetha), [@cawstudios](https://caw.tech) for the initial implementation! Thanks to MCP.so and Klavis AI for hosting and [@gstarwd](https://github.com/gstarwd), [@xiangkaiz](https://github.com/xiangkaiz) and [@zihaolin96](https://github.com/zihaolin96) for integrating our server. ## License MIT License - see LICENSE file for details

Browser Automation

6.5K Github Stars

firecrawl

Software by firecrawl

open-scouts

firecrawl

firesearch

rag-arena

firecrawl-app-examples

firecrawl-mcp-server