FreeCrawl SEO Tool
Open-source desktop SEO crawler β a free, cross-platform alternative to Screaming Frog
π Website Β· π¦ Download Β· π Report Bug Β· π Changelog
A high-performance website crawler built for serious SEO audits. Targets 1M+ URLs on a single machine, with a dense Screaming Frogβstyle UI, 150+ SEO issue checks, and zero native dependencies.
β¨ What FreeCrawl Does
FreeCrawl is a desktop SEO crawler that scales to 1M+ URLs on a single machine at 80β150 URL/s via undici with keep-alive, with optional JavaScript rendering through headless Chromium (Playwright) that captures the post-JS DOM, full-page / above-fold / mobile screenshots, LCP candidate elements, and a per-URL mobile usability audit. It runs 150+ on-page SEO checks across 30 top-level tabs β exact and near-duplicate clustering (SimHash + LSH) with a dedicated Cluster view, full hreflang validation (reciprocity, self-reference, inconsistent lang), OpenGraph / Twitter Card / JSON-LD / Web App Manifest parsing, AMP smoke validation, structured-data validation (duplicate @id, malformed @type, missing required props), WCAG accessibility checks (<main> landmark, skip-link, ARIA roles, heading order), security headers + SSL/TLS chain audit + active/passive mixed content split, readability scores (Flesch, FleschβKincaid, Gunning Fog), and a 10-rule custom CSS + regex extraction engine with a live Preview dialog and JSON import / export for sharing rule sets.
The dense dark UI renders virtualized 1M-row tables that live-stream rows during a crawl, with List β Tree β Cluster view toggles, column pin (sticky-left) + drag-to-reorder + show/hide, advanced AND/OR filters across 24 fields Γ 12 operators, per-tab quick-filter dropdowns, a 16-tab Details panel per URL (URL Details, Outline, Inlinks, Outlinks, Images, Resources, Extracted Data, SERP Snippet, HTTP Headers, Cookies, Structured Data, View Source, View Rendered, Screenshot, Duplicates, Analytics), an English + Turkish UI including every Settings panel, in-app scheduled crawls (hourly / daily / weekly / custom), a live memory monitor, robots.txt syntax validator, URL rewriting with live preview, project-vs-project compare diff, a Cytoscape link graph in its own native window, and a .seoproject file association for OS double-click.
Integrations cover Google Search Console (clicks, impressions, CTR, position + URL Inspection coverage), Google Analytics 4 (sessions, users, bounce, engagement), PageSpeed Insights Lighthouse audits, custom AI prompts via OpenAI / Anthropic Claude / local Ollama with {url}/{title}/{description}/{h1}/{body} variables, and SEO authority providers (Ahrefs / Majestic / Moz / Semrush) behind a single dropdown β each integration card now ships with an in-app step-by-step Setup Guide modal in English and Turkish that walks you through OAuth client creation, test-user setup, API enablement, and the most common errors. Exports go through a unified Export Crawl Data dialog (Excel .xlsx / CSV UTF-8 / JSON / XML with hierarchical category selection and nested folder output) plus a standalone HTML audit report, sitemap generator (standard / image / hreflang / sharded / gzipped), and direct streaming to Google Sheets and BigQuery. Project files can be saved as password-protected encrypted snapshots (.seoproject.enc, AES-256-GCM + PBKDF2). The MCP server exposes 76 tools that let Claude Code or any MCP client drive crawls live, query every UI surface, trigger exports, fetch GSC/GA4 data on demand, and modify settings β every action a human user can take in the desktop is callable from an agent. Crawl-completion webhooks, OS notifications, 22 built-in reports (histograms, top/bottom URLs, link positions, top words, cross-source orphans), per-URL Duplicates view, and a custom-CSS theme override round out the suite. Everything runs fully local β no telemetry, no cloud, MIT-style license.
π Tech Stack
| Layer | Choice |
|---|---|
| π’ Runtime | Node.js 22 LTS+ (ESM-first) |
| π Language | TypeScript 5.7+ strict |
| πͺ Desktop shell | Electron 41 |
| β‘ Build | electron-vite 5 / Vite 7 |
| π¨ UI | React 19 + Tailwind 3.4 + Zustand 5 |
| π Tables | @tanstack/react-table + @tanstack/react-virtual |
| π HTTP | undici 8 |
| π HTML parse | cheerio (htmlparser2 fast path) |
| π₯ Queue | p-queue 8 |
| π€ robots | robots-parser 3 |
| πΎ Storage | node:sqlite + WAL β zero native deps |
| π¦ Distribution | electron-builder 26 |
π Quick Start
[!TIP] End users: download the prebuilt installer from the Releases page β no setup required.
πͺ Windows β easiest path is the .bat launcher
Double-click FreeCrawl-SEO-Tool-Start.bat at the repo root. It verifies Node.js, runs npm install on first launch, then starts the app with npm run dev.
Don't want to install? Grab the portable
.exefrom the Releases page β runs without installation.
Or manually:
git clone https://github.com/kemalai/FreeCrawl-SEO-Tool.git
cd FreeCrawl-SEO-Tool
npm install
npm run dev
π macOS β Apple Silicon + Intel
Easiest path is the FreeCrawl-SEO-Tool-Start.sh launcher at the repo root β same one-click flow as the Windows .bat (verifies Node, prompts to install on first run, then starts the app).
chmod +x FreeCrawl-SEO-Tool-Start.sh
./FreeCrawl-SEO-Tool-Start.sh
Or manually:
# 1. Install prerequisites (skip any you already have)
brew install node@22 git
xcode-select --install # Command Line Tools β required once
# 2. Clone and run
git clone https://github.com/kemalai/FreeCrawl-SEO-Tool.git
cd FreeCrawl-SEO-Tool
npm install
npm run dev
If macOS Gatekeeper blocks an unsigned local build ("App is damaged"):
xattr -cr "/Applications/FreeCrawl SEO.app"
π§ Linux β Debian / Ubuntu / Fedora / Arch
Easiest path is the FreeCrawl-SEO-Tool-Start.sh launcher at the repo root (same as macOS).
Prebuilt installers are available for all three families: .AppImage (universal), .deb (Debian / Ubuntu), and .rpm (Fedora / RHEL).
# 1. Install Node.js 22 LTS (Debian / Ubuntu via NodeSource)
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt install -y nodejs git
# Fedora / RHEL : sudo dnf install nodejs:22 git
# Arch : sudo pacman -S nodejs npm git
# 2. Clone and run
git clone https://github.com/kemalai/FreeCrawl-SEO-Tool.git
cd FreeCrawl-SEO-Tool
npm install
npm run dev
Some headless / minimal distros also need GTK/X11 runtime libs for Electron:
sudo apt install -y libgtk-3-0 libnss3 libasound2t64
β¨ CLI (headless crawl)
npm run build:cli
node apps/cli/dist/index.js https://example.com --depth 2 --max 500 --out out.csv
node apps/cli/dist/index.js --list urls.txt --out out.json # list mode + JSON
CI / CD recipes β ready-to-use GitHub Actions and GitLab CI examples that crawl your site on a schedule, fail the build when broken-URL count exceeds a threshold, and upload the crawl as an artifact.
π¦ Production build (per-platform installers)
npm run build # all packages + desktop + CLI
npm --workspace apps/desktop run build:win # Windows installer (NSIS) + portable .exe
npm --workspace apps/desktop run build:mac # macOS DMG (arm64 + x64)
npm --workspace apps/desktop run build:linux # AppImage / .deb / .rpm
π€ MCP server β query AND drive crawls from Claude / any MCP client
FreeCrawl ships an MCP (Model Context Protocol) server that exposes the active .seoproject to AI agents over stdio. Two capabilities in one server:
- Read-only data access to the SQLite project β runs alongside the desktop app without contention (WAL allows concurrent readers).
- Live crawl control β when the desktop app is open, an agent can start / pause / resume / stop crawls and poll progress in real time. This goes through a localhost-only HTTP bridge (127.0.0.1, ephemeral random port, 32-byte Bearer token auth, discovery file written to
<userData>/mcp-bridge.jsonon app launch).
47 tools:
| Group | Tools |
|---|---|
| π Top-level data | get_summary, get_overview_counts, top_issues, query_urls, get_url_detail |
| π¬ Per-URL detail sub-tabs | get_url_source (raw + rendered + screenshot paths), get_url_inlinks, get_url_outlinks, get_url_images, get_url_headers, get_url_duplicates, get_url_analytics, get_url_cert |
| π§© Integration rows | query_gsc, query_ga4, query_pagespeed, query_ai, query_seo |
| π Specialised queries | query_images, query_broken_links, list_duplicate_clusters |
| π Reports | report_status_code_histogram, report_indexability_distribution, report_content_kind_distribution, report_depth_histogram, report_response_time_histogram, report_inlinks_histogram, report_word_count_histogram, report_url_length_histogram, report_top_urls_by, report_top_anchor_texts, report_external_domain_health, report_image_weight_per_page, report_link_position_breakdown, report_pages_per_directory, report_word_count_per_directory, report_sitemap_orphans, report_analytics_coverage, report_server_header_breakdown |
| π Project management | list_projects, set_project, current_project |
| π· Crawl control (desktop must be open) | start_crawl, stop_crawl, pause_crawl, resume_crawl, clear_crawl, get_crawl_progress, get_desktop_project |
start_crawl accepts a startUrl plus optional whitelisted overrides (scope, maxDepth, maxUrls, maxConcurrency, maxRps, crawlDelayMs, requestTimeoutMs, respectRobotsTxt, followRedirects, crawlExternal, userAgent, include/excludePatterns) β anything you don't override keeps the desktop user's saved value. Crawls launched via MCP go through the same code path as the UI's Start button, so progress shows up in the desktop app live as the agent drives it. Call clear_crawl first when you want a fresh BFS after a completed crawl with the same seed URL (otherwise the crawler treats the re-start as a resume and exits because every URL is already in the DB).
1. Build it once:
npm run build:mcp
This produces apps/mcp-server/dist/index.js.
2. Register it with your MCP client.
Claude Desktop
Edit your Claude Desktop config:
| Platform | Path |
|---|---|
| Windows | %APPDATA%\Claude\claude_desktop_config.json |
| macOS | ~/Library/Application Support/Claude/claude_desktop_config.json |
| Linux | ~/.config/Claude/claude_desktop_config.json |
{
"mcpServers": {
"freecrawl": {
"command": "node",
"args": ["/absolute/path/to/FreeCrawl-SEO-Tool/apps/mcp-server/dist/index.js"]
}
}
}
Restart Claude Desktop. The freecrawl server appears under the tool π icon.
Claude Code (CLI)
claude mcp add freecrawl -- node /absolute/path/to/FreeCrawl-SEO-Tool/apps/mcp-server/dist/index.js
Other MCP clients
Run the binary directly with stdio transport:
node apps/mcp-server/dist/index.js
The server speaks newline-delimited JSON-RPC 2.0 β point any MCP-compatible client at it.
3. Try it. Ask your agent things like:
"Crawl https://example.com with maxDepth 3 and watch the progress." "Show the 10 URLs with the longest response time in my last crawl." "What are the top 5 issue categories with the most affected pages?" "List every URL with a missing meta description." "Pause the running crawl, then resume it once I've checked the first 1000 URLs."
Pointing at a non-default project:
By default the server reads <userData>/projects/default.seoproject (the same file the desktop app uses). Override with the FREECRAWL_PROJECT env var, or call the set_project tool mid-session:
{
"mcpServers": {
"freecrawl": {
"command": "node",
"args": ["/path/to/apps/mcp-server/dist/index.js"],
"env": { "FREECRAWL_PROJECT": "/path/to/audit.seoproject" }
}
}
}
π Prerequisites
For developers / source builds
| Component | Minimum | Where |
|---|---|---|
| Node.js | 22 LTS (24 also OK) | nodejs.org |
| npm | 10+ (ships with Node) | bundled |
| Git | any recent | git-scm.com |
Why no Python / MSBuild / node-gyp? FreeCrawl uses Node 22's built-in
node:sqliteinstead ofbetter-sqlite3. There are zero native dependencies βnpm installnever invokes a C++ compiler.
Verify your setup:
node --version # v22.x.x or v24.x.x
npm --version # 10+
Runtime requirements (any platform)
- Outbound HTTPS access to the sites you crawl. Behind a corporate proxy? Set
HTTPS_PROXY=http://your-proxy:portbefore launch β undici'sProxyAgentroutes through it automatically. - TLS root certificates. Node ships with the Mozilla CA bundle. If your antivirus or company proxy performs HTTPS inspection (Kaspersky, ESET, Zscaler, BlueCoat, β¦), set
NODE_EXTRA_CA_CERTS=C:\path\to\corp-ca-bundle.crtβ otherwise crawls fail withUNABLE_TO_GET_ISSUER_CERT_LOCALLY.
Disk + memory budget
| Resource | Size |
|---|---|
node_modules after npm install |
~600 MB |
| Production Electron build | ~150 MB |
| Peak RAM, 100K-URL crawl | ~100 MB |
| 1M-URL crawl | comfortably under 1 GB |
π Project Structure
FreeCrawl-SEO-Tool/
βββ π FreeCrawl-SEO-Tool-Start.bat # Windows one-click launcher
βββ π FreeCrawl-SEO-Tool-Start.sh # macOS / Linux one-click launcher
βββ π CHANGELOG.md # versioned release notes
βββ π apps/
β βββ πͺ desktop/ # Electron app (main + preload + renderer)
β βββ β¨ cli/ # headless Node CLI
β βββ π€ mcp-server/ # MCP server for AI agents
βββ π packages/
βββ π shared-types/ # IPC + domain types
βββ πΎ db/ # ProjectDb (node:sqlite) + migrations
βββ π· core/ # crawler engine (UI-agnostic)
Dependency graph
graph LR
A[shared-types] --> B[db]
B --> C[core]
C --> D[desktop]
C --> E[cli]
B --> F[mcp-server]
π Status
[!NOTE] Active development. All 29 analysis tabs (Internal, External, Response Codes, URL, Page Titles, Meta Description, H1, H2, Content, Images, Canonicals, Directives, Redirects, Pagination, Hreflang, AMP, Structured Data, Meta Refresh, Custom Extraction, Custom Search, Security, Duplicates, Links, Broken Links, SERP, PageSpeed, Search Console, GA4, AI, SEO Authority) plus standalone Visualization window, advanced search, per-tab quick-filter dropdown + List/Tree view toggle, 150+ issue categories, sitemap export variants, Export Crawl Data dialog (XLSX / CSV-UTF-8 / JSON / XML with hierarchical tree picker + nested folder output), list mode, custom extraction, near-duplicate + exact-duplicate detection, hreflang validation, project compare, Cytoscape visualization, auth, proxy, webhook, MCP server with crawl control + live progress, Google PageSpeed / Search Console / URL Inspection / GA4 integrations, AI per-URL prompts (OpenAI / Anthropic / Ollama), SEO Authority providers (Ahrefs / Majestic / Moz / Semrush), Google Sheets + BigQuery direct export, encrypted project snapshots (AES-256-GCM), cross-source orphan detection, JavaScript rendering with Playwright (post-JS DOM, screenshot capture, LCP candidate, Mobile Usability audit), memory-limit auto-pause watchdog, OS notifications, robots.txt syntax validator, URL rewriting + preview, status-code diagnosis banner, live memory monitor, in-app scheduled crawl, multi-language UI (EN + TR) with full Settings coverage,
.seoprojectfile association, in-app logs, and diagnostic popups are working. Cross-platform installers (Windows.exe+ portable, macOS.dmg, Linux.AppImage/.deb/.rpm) β release builds ship Playwright Chromium offline so JS rendering works on first launch. Live-streaming UX with first row in ~1 s.Upcoming (V2): Log file analyzer, Plugin system, Light theme, Multi-window, Code-signing + auto-update.
π€ Contributing & Support
| π Found a bug? | Open an issue |
| π‘ Have a feature idea? | Start a discussion |
| π¦ Want the prebuilt app? | Download a release |
| π Project website | freecrawl.net |
π License
MIT β see LICENSE
Built with β€ for SEO professionals who want a fast, free, open alternative to Screaming Frog.