[!NOTE] π€ AI-Aided Development (AIAD)
This project openly uses AI-assisted development (e.g. Claude Code) to accelerate workflows, improve code quality, and gain more development momentum. All AI-generated code is reviewed and approved by humans β this is not a vibe-coding project, but a deliberate effort to build a useful product while exploring the boundaries, benefits, and trade-offs of AI-aided development.
π₯· What is Scrape Dojo?
Scrape Dojo is a self-hosted web scraping & browser automation platform. Instead of writing Puppeteer code for every site, you define workflows declaratively in JSON/JSONC β like Infrastructure-as-Code, but for scraping.
Key capabilities:
- β‘ 25+ built-in actions β navigate, click, type, extract, loop, download, screenshot, and more
- π§© Handlebars + JSONata β dynamic templates and powerful data transformations
- β° Cron scheduling β automate scrapes with cron, webhooks, or startup triggers
- π Encrypted secrets β AES-256-CBC at-rest encryption for credentials
- π‘ Real-time monitoring β SSE-powered live execution tracking in Angular UI
- π‘οΈ Auth (optional) β JWT, OIDC/SSO, MFA/TOTP, API keys
- ποΈ Multi-DB β SQLite (default), MySQL, PostgreSQL
[!IMPORTANT] Scrape Dojo automates real browser interactions. Please respect website terms of service and applicable legal frameworks.
Full documentation: scrape-dojo.com
π³ Quick Start (Docker)
# 1. Generate encryption key
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
# 2. Create docker-compose.yml
cat <<'EOF' > docker-compose.yml
services:
scrape-dojo:
image: ghcr.io/disane87/scrape-dojo:latest
ports:
- '8080:80'
environment:
- SCRAPE_DOJO_ENCRYPTION_KEY=your_generated_key_here
- SCRAPE_DOJO_AUTH_JWT_SECRET=your_random_jwt_secret_here
- SCRAPE_DOJO_AUTH_REFRESH_TOKEN_SECRET=your_random_refresh_secret_here
- DB_TYPE=sqlite
# - SCRAPE_DOJO_PROXY_URL=http://proxy:8080 # Optional: route scrapes through a proxy
volumes:
- ./data:/home/pptruser/app/data
- ./downloads:/home/pptruser/app/downloads
- ./logs:/home/pptruser/app/logs
- ./config:/home/pptruser/app/config
- ./browser-data:/home/pptruser/app/browser-data
restart: unless-stopped
EOF
# 3. Start
docker compose up -d
Open http://localhost:8080 β UI and API on the same port.
[!WARNING] The
SCRAPE_DOJO_ENCRYPTION_KEYencrypts all secrets. Store it safely β if lost, existing secrets are unrecoverable.
For local development, environment variables, auth setup, and more: see the Quickstart Guide.
β‘ Your First Scrape
Create config/sites/my-first-scrape.jsonc:
{
"$schema": "../scrapes.schema.json",
"scrapes": [
{
"id": "my-first-scrape",
"metadata": {
"description": "Read a page title",
"triggers": [{ "type": "manual" }],
},
"steps": [
{
"name": "Main",
"actions": [
{
"name": "open",
"action": "navigate",
"params": { "url": "https://example.com" },
},
{
"name": "title",
"action": "extract",
"params": { "selector": "h1" },
},
{
"name": "log",
"action": "logger",
"params": { "message": "Title: {{previousData.title}}" },
},
],
},
],
},
],
}
The scrape auto-appears in the UI (hot reload). Click Run or use the API:
curl http://localhost:8080/api/scrape/my-first-scrape
π Documentation
Everything else lives in the docs:
| Topic | Link |
|---|---|
| π Quickstart (Docker & Source) | Getting Started |
| π Config format & metadata | Configuration |
| β‘ All 22 actions with examples | Actions Reference |
| π§© Templates & JSONata | Templates |
| β° Scheduling & triggers | Scheduling |
| π Secrets & variables | Secrets & Variables |
| βοΈ Environment variables | Env Reference |
| ποΈ Architecture & API | Developer Guide |
| π‘οΈ Auth (JWT/OIDC/MFA) | Authentication |
| π‘ Full examples | Examples |
π οΈ Development
git clone https://github.com/disane87/scrape-dojo.git && cd scrape-dojo
pnpm install
cp .env.example .env # Set SCRAPE_DOJO_ENCRYPTION_KEY
pnpm start # API (3000) + UI (4200)
pnpm test # All tests
| Command | What it does |
|---|---|
pnpm start |
API + UI dev servers |
pnpm test |
All tests |
pnpm test:api |
API tests only |
pnpm test:ui |
UI tests only |
pnpm lint |
Lint all projects |
pnpm build |
Build all apps |
Commits follow Conventional Commits (feat:, fix:, docs:, etc.).
π€ Contributing
- π Issues & bugs: GitHub Issues
- π‘ Feature requests: New Issue
- π Pull requests: Fork β branch β commit β PR
π License
MIT β use it however you like.
π Contributors
Made with β€οΈ by Marco Franke
Documentation Β· Issues Β· Discussions