Find Me a Job - AI-Powered Job Scraper & Matcher

An automated job scraping and AI matching pipeline that runs on a schedule, scrapes jobs from LinkedIn and RemoteOK, prevents fetching the same job twice, scores each one against your CV using an LLM, generates a cover letter for good matches, stores matched jobs in a local SQLite database, and serves them through a Streamlit dashboard with analytics, filtering, and job management. A Cloudflare Quick Tunnel exposes the dashboard publicly, and Telegram notifications send you the access URL on startup plus a summary after each run. Everything runs locally in Docker.

n8n Main Workflow

LinkedIn Sub-Workflow	RemoteOK Sub-Workflow

Features
Getting Started
Configuration
AI Scoring Logic
Choosing an LLM Provider
Database Schema
Python API Reference
Estimated Token Usage Per Job
Dashboard
Docker Services
Download Size
License

Features

Dual source scraping - LinkedIn (with filters) and RemoteOK, each as a modular sub-workflow
Multiple LinkedIn searches - define multiple search queries (different keywords, locations, filters) in a single config file; all are executed in one run
Deduplication - jobs already seen or pending are skipped automatically across runs
AI scoring - scores each job 0–100 based on your CV, required skills, and years of experience; small experience gaps (1–2 years) are penalized lightly, 3+ years below means score 0
Cover letter generation - only generated for jobs scoring above FILTERING_SCORE (default 60), saving tokens
Streamlit dashboard at localhost:8501 with analytics (stat cards, charts, daily application history) and a filterable, sortable jobs table with inline job cards and status management
Auto email application - when a job listing includes an email address, the workflow sends a personalized application email with your CV attached and marks the job as email_sent
Cloudflare Quick Tunnel - auto-creates a public trycloudflare.com URL for the dashboard, no account needed
Telegram notifications - sends the dashboard URL on startup and a summary after each workflow run
LLM-powered keyword extraction - extracts job titles and skills from your CV to filter RemoteOK results; cached and only re-extracted when the CV changes
Flexible LLM provider - any OpenAI-compatible API (Groq, Google AI Studio, OpenRouter, local models, etc.)
Persistent storage - SQLite with Alembic migrations applied on startup; old records purged automatically

Getting Started

Prerequisites

Docker and Docker Compose
An LLM API key from any OpenAI-compatible provider (e.g., Groq, Google AI Studio, OpenRouter) - see Choosing an LLM Provider
A Telegram Bot - optional, for run notifications
Your CV as a .docx file

1. Clone the repository

git clone https://github.com/yourusername/find-me-job.git
cd find-me-job

2. Set up your environment file

cp .env.example .env

Open .env and fill in your values. See Environment Variables for details.

3. Add your CV

cp /path/to/your-cv.docx cv.docx

4. Configure your LinkedIn searches

Edit params/linkedin_searches.txt with your desired search parameters. See LinkedIn Search Config for the full reference.

5. Start the containers

docker compose up -d

Workflows are automatically imported into n8n on first start.

6. Configure n8n credentials

Open n8n at http://localhost:5678. The LLM config is read from your .env. For Telegram notifications, add a Telegram credential with {{ $env.TELEGRAM_BOT_TOKEN }}.

7. Open the dashboard

Open http://localhost:8501 to browse results, track applications, and view analytics. A public trycloudflare.com URL is also created automatically and sent to your Telegram.

Configuration

Environment Variables (`.env`)

# ── n8n ─────────────────────────────────────────────
N8N_HOST=localhost
N8N_PORT=5678
N8N_PROTOCOL=http
WEBHOOK_URL=http://localhost:5678
DB_TYPE=sqlite
DB_SQLITE_DATABASE=/data/db/n8n.db
N8N_BLOCK_ENV_ACCESS_IN_NODE=false
N8N_IMPORT_WORKFLOWS_FROM=/workflows
GENERIC_TIMEZONE=Africa/Cairo

# Days before old job records are purged (default: 60)
# Cleanup runs on startup and daily at midnight
DELETE_OLD_JOBS_DAYS=60

# ── LLM ──────────────────────────────────────────────
# API key for your chosen LLM provider
LLM_API_KEY=your_api_key_here
# Must be an OpenAI-compatible chat completions endpoint
LLM_URL=https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
# Model name supported by your chosen provider
LLM_MODEL=gemini-2.5-flash
# Minimum score (0–100) for a job to be saved to filtered_jobs (default: 60)
FILTERING_SCORE=60

# ── Telegram (optional) ──────────────────────────────
# Your personal Telegram user ID (get from @get_id_bot)
TELEGRAM_ID=123456789
# Bot token from @BotFather
TELEGRAM_BOT_TOKEN=xxxxxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# ── Email (optional) ─────────────────────────────────
# Set AUTO_EMAIL to any non-empty value to enable auto email applications
AUTO_EMAIL=
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
[email protected]
SMTP_APP_PASSWORD=your_app_password
SENDER_NAME=Your Name

LinkedIn Search Config

Edit params/linkedin_searches.txt. The file supports multiple searches in a single config - the workflow loops over all entries in the searches array:

{
  "searches": [
    {
      "Keyword": "Software Engineer",
      "Location": "Cairo, Egypt",
      "Experience Level": "Entry level, Associate",
      "Remote": "Remote, Hybrid, On-Site",
      "Job Type": "Full-time",
      "Last Posted": "r604800",
      "Easy Apply": ""
    },
    {
      "Keyword": "Software Engineer",
      "Location": "Germany",
      "Experience Level": "Entry level, Associate",
      "Remote": "Remote, Hybrid, On-Site",
      "Job Type": "Full-time",
      "Last Posted": "r604800",
      "Easy Apply": "true"
    }
  ]
}

Add as many search objects to the searches array as you need - each one runs as a separate LinkedIn query within the same workflow execution.

Field reference:

Field	Example Values	Notes
`Keyword`	`"Python Developer"`	Job title or skill - single value
`Location`	`"Cairo, Egypt"`	City or country - single value
`Experience Level`	`"Entry level, Associate"`	Comma-separated, multiple allowed
`Remote`	`"Remote, Hybrid"`	Comma-separated, multiple allowed
`Job Type`	`"Full-time, Contract"`	Comma-separated, multiple allowed
`Last Posted`	`"r86400"`	`r86400`=24h, `r604800`=1 week, `r2592000`=1 month
`Easy Apply`	`"true"` or `""`	Any non-empty string enables it

LLM Keywords Config

Edit params/llm_keywords_extract.txt - a prompt template sent to the LLM along with your CV text. The LLM extracts:

titles - 3–5 realistic job titles based on your experience level
skills - 10–20 technical skills from your CV

These keywords filter RemoteOK results so only matching jobs enter the pipeline. Results are cached and only re-extracted when your CV changes.

AI Scoring Logic

Each job is scored individually by the LLM using the following logic.

Input to the model:

Your full CV text (extracted from cv.docx)
The full job description
Today's date (injected dynamically for calculating years of experience)

Scoring rules:

Factor	Effect on Score
Required skills present in CV	High positive
Required skills missing from CV	Negative
Nice-to-have skills present	Small bonus
Experience meets or exceeds requirement	No penalty
Experience 1–2 years below requirement	Slight penalty
Experience 3+ years below requirement	Score = 0, stop immediately

Output format:

{"score": 78, "coverLetter": "..."}

The cover letter is a 2-paragraph professional body - no name, address, or signature - so it works as a clean template you can customize before sending. Jobs scoring below FILTERING_SCORE (default 60) get an empty cover letter to save tokens.

Choosing an LLM Provider

The workflow works with any OpenAI-compatible API. Configure your provider by setting three environment variables in your .env:

Variable	Description	Example
`LLM_API_KEY`	Your API key	`gsk_xxxx`, `AIzaSy...`, `sk-...`
`LLM_URL`	Chat completions endpoint	See examples below
`LLM_MODEL`	Model identifier	See examples below

Provider examples:

Provider	`LLM_URL`	`LLM_MODEL`	Free Tier
Groq	`https://api.groq.com/openai/v1/chat/completions`	`llama-3.3-70b-versatile`	Yes
Google AI Studio	`https://generativelanguage.googleapis.com/v1beta/openai/chat/completions`	`gemini-2.5-flash`	Yes
OpenRouter	`https://openrouter.ai/api/v1/chat/completions`	`meta-llama/llama-3.3-70b`	Some models
OpenAI	`https://api.openai.com/v1/chat/completions`	`gpt-4o`	No
Anthropic (via proxy)	Any OpenAI-compatible proxy URL	`claude-sonnet-4-20250514`	No
Local (Ollama)	`http://host.docker.internal:11434/v1/chat/completions`	`llama3`	N/A

For the best scoring and cover letter quality, consider using Claude Sonnet or GPT-4o on the paid tier. The difference in cover letter coherence and scoring nuance is significant compared to free-tier models.

Database Schema

-- Jobs fully processed in previous runs (long-term deduplication)
CREATE TABLE seen_jobs (
  id       TEXT PRIMARY KEY,    -- "linkedin_4384934676" or "remoteok_1130786"
  seen_at  DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Jobs discovered this run, waiting to be scored by the LLM
CREATE TABLE pending_jobs (
  id          TEXT PRIMARY KEY,
  title       TEXT,
  company     TEXT,
  location    TEXT,
  applylink   TEXT,
  description TEXT,
  website     TEXT,             -- "linkedin" or "remoteok"
  easy_apply  BOOLEAN DEFAULT FALSE,
  created_at  DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Jobs scored by the LLM, displayed in the local dashboard
CREATE TABLE filtered_jobs (
  id           TEXT PRIMARY KEY,
  title        TEXT,
  company      TEXT,
  location     TEXT,
  applylink    TEXT,
  description  TEXT,
  website      TEXT,
  score        INTEGER,           -- 0–100 AI match score
  application_document TEXT,     -- generated cover letter / application text (nullable)
  easy_apply   BOOLEAN DEFAULT FALSE,
  ai_status    TEXT,              -- "fit" or "not_fit"
  user_status  TEXT DEFAULT 'new', -- "new", "applied", "wont_apply", or "email_sent"
  created_at   DATETIME DEFAULT CURRENT_TIMESTAMP,
  updated_at   DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- CV hash and extracted keyword cache
CREATE TABLE cv_keywords (
  id         INTEGER PRIMARY KEY,
  cv_hash    TEXT NOT NULL,
  keywords   TEXT NOT NULL,
  updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Schema is managed by Alembic migrations, applied automatically on each container startup. Records older than DELETE_OLD_JOBS_DAYS (default 60) days are automatically purged on startup and daily at midnight.

Viewing the database: The file lives at ./data/db/jobs.db on your host. Open it directly in DBeaver - select SQLite, browse to the file, and connect. No server or credentials needed.

Dashboard

The project includes a Streamlit dashboard at http://localhost:8501 for browsing and managing your matched jobs.

Analytics Tab	Jobs Tab

Analytics - stat cards, AI/user status donut charts, daily applications bar chart (last 7 days)
Jobs - filterable, sortable table with inline job cards; filter by AI status, user status, easy apply, min score, company, website; search by title or company
Actions - mark as applied / won't apply / email sent, reset to new, or delete
Defaults - ai_status=fit, user_status=new, min_score from FILTERING_SCORE

Python API Reference

The sidecar API runs on port 8001. From n8n use http://python-api:8001. From your host use http://localhost:8001.

All endpoints are prefixed with /api. On startup, the API automatically runs Alembic migrations and purges old records. Old job cleanup also runs daily at midnight via a background scheduler.

Jobs (/api/jobs):

Method	Endpoint	Params / Body	Description
`GET`	`/api/jobs/exists`	`?jobid=linkedin_123`	Returns `{"exists": true/false}`
`POST`	`/api/jobs/pending`	JSON body	Insert a new job into pending_jobs
`GET`	`/api/jobs/pending`	-	List all pending jobs
`POST`	`/api/jobs/filtered`	JSON body	Move job from pending → filtered_jobs with score and cover letter
`GET`	`/api/jobs/filtered`	`?ai_status=fit&user_status=new&min_score=60&search=...&company=...&website=...&sort_by=updated_at&sort_order=desc&page=1&page_size=20`	Paginated, filterable, sortable job list
`GET`	`/api/jobs/filtered/options`	-	Distinct company and website values for filter dropdowns
`GET`	`/api/jobs/filtered/{jobid}`	-	Get a single filtered job by ID
`PATCH`	`/api/jobs/filtered/{jobid}/status`	`{"user_status": "applied"}`	Update user tracking status (`new` / `applied` / `wont_apply` / `email_sent`)
`DELETE`	`/api/jobs/filtered/{jobid}`	-	Delete a job from filtered_jobs
`GET`	`/api/jobs/stats`	-	Aggregate counts (total, fit, not_fit, new, applied, wont_apply, email_sent, avg_score)
`GET`	`/api/jobs/stats/daily-applied`	`?days=7`	Daily application counts for the last N days

Email (/api/email):

Method	Endpoint	Params / Body	Description
`POST`	`/api/email/send`	JSON body	Send an application email with CV attached via SMTP

CV (/api/cv):

Method	Endpoint	Params / Body	Description
`GET`	`/api/cv`	-	Extract and return text from cv.docx
`GET`	`/api/cv/check/{cv_hash}`	-	Check if a CV hash exists in keyword cache
`GET`	`/api/cv/keywords`	-	Get cached keywords and CV hash
`POST`	`/api/cv/keywords`	`{"cv_hash": "...", "keywords": "..."}`	Save/update keyword cache

Params (/api/params):

Method	Endpoint	Description
`GET`	`/api/params/{name}`	Read and return `params/{name}.txt`

`POST /api/jobs/pending`

{
  "id": "linkedin_xxxxxxxx",
  "title": "Software Engineer",
  "company": "X Corp",
  "location": "Cairo, Egypt",
  "applylink": "https://linkedin.com/jobs/view/xxxxxxxx",
  "description": "We are looking for a software engineer...",
  "website": "linkedin",
  "easy_apply": false
}

`POST /api/jobs/filtered`

Same fields as pending, plus score, application_document, and ai_status:

{
  "id": "linkedin_xxxxxxxx",
  "title": "Software Engineer",
  "company": "X Corp",
  "location": "Cairo, Egypt",
  "applylink": "https://linkedin.com/jobs/view/xxxxxxxx",
  "description": "We are looking for a software engineer...",
  "website": "linkedin",
  "score": 82,
  "application_document": "I am excited to apply for...",
  "easy_apply": false,
  "ai_status": "fit"
}

`POST /api/email/send`

{
  "recipient": "[email protected]",
  "subject": "Application for Software Engineer",
  "body": "Dear Hiring Manager,\n\nI am excited to apply for..."
}

The email is sent via SMTP using the credentials from your .env (SMTP_HOST, SMTP_PORT, SMTP_USER, SMTP_APP_PASSWORD, SENDER_NAME). Your cv.docx is automatically attached. Set AUTO_EMAIL to any non-empty value in .env to enable the n8n workflow to call this endpoint automatically when a job listing provides an email address.

Estimated Token Usage Per Job

Job scoring & cover letter (every job)

One LLM call per scraped job — scores the job against your CV and generates a cover letter for fits.

Component	Tokens (approx)
System prompt	~300
CV text	~500–800
Job description	~500–1,000
Output (score + cover letter)	~400–600
Total	~1,700–2,700

Email eligibility check (fit jobs with `AUTO_EMAIL` enabled)

A second LLM call runs only on jobs that scored ≥ FILTERING_SCORE and whose description contains an email hint. It receives the job description and the cover letter from step 1, determines whether the job actually requires applying via email, and if so extracts the recipient address and generates a professional application email body.

Component	Tokens (approx)
System prompt	~300
Job description	~500–1,000
Cover letter (from scoring step)	~200–400
Job title + company + sender name	~20–30
Output (JSON with email body or false)	~200–400
Total	~1,200–2,200

Summary

Scenario	LLM Calls	Tokens per job (approx)
Scoring only (`AUTO_EMAIL` off)	1	~1,700–2,700
Scoring + email check (`AUTO_EMAIL` on, job has email hint)	2	~2,900–4,900

Note: The email LLM call only runs on fit jobs whose description matches an email pattern — typically a small fraction of total scraped jobs. Most jobs still consume only the scoring tokens.

Docker Services

Service	Image	Port	Purpose
`n8n`	Custom (built from `n8n/Dockerfile` based on `n8nio/n8n:2.11.4`)	`5678`	Workflow automation engine with auto-import
`find-me-job-python-api`	Custom (built from `python-api/Dockerfile` based on `python:3.12-slim`)	`8001`	FastAPI sidecar (SQLModel ORM, Alembic migrations) for DB, CV, and params
`find-me-job-dashboard`	Custom (built from `dashboard/Dockerfile` based on `python:3.12-slim`)	`8501`	Streamlit dashboard for analytics and job management
`find-me-job-tunnel`	`cloudflare/cloudflared:2026.3.0`	`20241`	Cloudflare Quick Tunnel - exposes the dashboard via a public `trycloudflare.com` URL

The n8n service uses a custom Docker image that automatically imports workflows from the workflows/ directory on first start. Subsequent starts skip the import to preserve any manual changes made within n8n.

Useful commands

docker compose up -d              # Start all services
docker compose up -d --build      # Rebuild images after code changes
docker compose logs -f python-api # Tail API logs
docker compose down               # Stop everything
docker compose down -v            # Stop and wipe all data (database + n8n state)

# Force re-import of workflows on next start
docker exec n8n rm /home/node/.n8n/.imported && docker restart n8n

Download Size

Estimated download size on first docker compose up -d:

Component	Download Size
n8n Docker image (`n8nio/n8n:2.11.4`)	~300 MB
Python base image (`python:3.12-slim`) (shared by API + dashboard)	~50 MB
Dashboard pip dependencies (Streamlit, Plotly)	~50 MB
Cloudflared image (`cloudflare/cloudflared:2026.3.0`)	~30 MB
Total download	~430 MB

The SQLite database and n8n internal data (in data/) grow over time but typically stay under a few MB.

License

MIT License - see LICENSE for details.

Find-Me-Job

About Find-Me-Job

Platforms

Languages

Links

README.md

Find Me a Job - AI-Powered Job Scraper & Matcher

Table of Contents

Features

Getting Started

Prerequisites

1. Clone the repository

2. Set up your environment file

3. Add your CV

4. Configure your LinkedIn searches

5. Start the containers

6. Configure n8n credentials

7. Open the dashboard

Configuration

Environment Variables (`.env`)

LinkedIn Search Config

LLM Keywords Config

AI Scoring Logic

Choosing an LLM Provider

Database Schema

Dashboard

Python API Reference

`POST /api/jobs/pending`

`POST /api/jobs/filtered`

`POST /api/email/send`

Estimated Token Usage Per Job

Job scoring & cover letter (every job)

Email eligibility check (fit jobs with `AUTO_EMAIL` enabled)

Summary

Docker Services

Useful commands

Download Size

License

Find-Me-Job

About Find-Me-Job

Platforms

Languages

Links

README.md

Find Me a Job - AI-Powered Job Scraper & Matcher

Table of Contents

Features

Getting Started

Prerequisites

1. Clone the repository

2. Set up your environment file

3. Add your CV

4. Configure your LinkedIn searches

5. Start the containers

6. Configure n8n credentials

7. Open the dashboard

Configuration

Environment Variables (.env)

LinkedIn Search Config

LLM Keywords Config

AI Scoring Logic

Choosing an LLM Provider

Database Schema

Dashboard

Python API Reference

POST /api/jobs/pending

POST /api/jobs/filtered

POST /api/email/send

Estimated Token Usage Per Job

Job scoring & cover letter (every job)

Email eligibility check (fit jobs with AUTO_EMAIL enabled)

Summary

Docker Services

Useful commands

Download Size

License

Environment Variables (`.env`)

`POST /api/jobs/pending`

`POST /api/jobs/filtered`

`POST /api/email/send`

Email eligibility check (fit jobs with `AUTO_EMAIL` enabled)