About rdrr

Convert any URL to clean markdown for AI agents.

f

Published by

fkonovalov

Visit View Profile

README.md

View on GitHub

rdrr

Convert any URL to clean markdown for AI agents.

npx rdrr https://react.dev/learn

Features

Fast: no headless browser, lightweight
Smart: 20+ site-specific extractors (Wiki, Reddit, X, MDN, Claude, Substack ...)
LLM-ready: strips ads, navigation, footers; keeps code blocks, tables, math
Versatile: webpages, GitHub issues/PRs, PDFs, X profiles, YouTube transcripts

Install

npm install rdrr

Quick start

CLI

# Webpage
rdrr https://react.dev/learn

# YouTube transcript
rdrr https://www.youtube.com/watch?v=dQw4w9WgXcQ

# GitHub issue with comments
rdrr https://github.com/mozilla/readability/issues/1

# X timeline
rdrr https://x.com/discotune -n 10

# Single X post (direct API, bypasses login walls)
rdrr https://x.com/discotune/status/2045444995768078376

# Save to file
rdrr https://example.com -o article.md

# Copy to clipboard
rdrr https://example.com --clip

# Fit a 2k-token budget for LLM context
rdrr https://some.article.example/long-read --budget 2000

# LLM-friendly XML with quality score
rdrr https://react.dev/learn --format xml --quality

# List recent fetches
rdrr history --limit 10

Library

import { parse } from "rdrr"

const result = await parse("https://en.wikipedia.org/wiki/TypeScript")

result.title       // "TypeScript"
result.content     // clean markdown
result.wordCount   // 2847
result.siteName    // "Wikipedia"

CLI flags

Flag	Description
`-o, --output <file>`	Save to file instead of stdout
`-c, --clip`	Copy output to the system clipboard (suppresses stdout)
`-j, --json`	Full JSON with metadata (alias for `--format json`)
`--format <fmt>`	Output format: `md` (default), `json`, `jsonl`, or `xml`
`-p, --property <name>`	Extract a single field (`title`, `content`, ...)
`-l, --language <code>`	Preferred language (BCP 47)
`-n, --limit <n>`	Max items for aggregate URLs (default: `10`)
`--order <order>`	`newest` (default) or `oldest`
`--budget <tokens>`	Truncate body at a paragraph boundary to fit a token budget
`--quality`	Attach a readability score (0-100) + signals to JSON output
`--check`	Probe if URL is readable (exit 0/1)
`--llms`	Append site's `/llms.txt`
`--no-history`	Skip logging this call to history
`--debug`	Pipeline diagnostics to stderr

Subcommands

rdrr history [--limit 20] [--search react] [--since 2026-04-01] [--json]
rdrr last [--json]

History lives at $XDG_STATE_HOME/rdrr/history.jsonl (falls back to ~/.local/state/rdrr/history.jsonl), auto-rotates at 1000 entries, and strips basic-auth credentials before writing. Disable globally with RDRR_NO_HISTORY=1.

API

`parse(url, options?)`

import { parse } from "rdrr"

const result = await parse(url, {
  language: "en",
  includeLlmsTxt: true,
})

Returns a ParseResult with type, title, author, content, description, domain, siteName, published, wordCount, readTime, and more. The result is narrowed by type: "webpage", "youtube", "github", "pdf", "x-profile", or "x-status".

`parseHtml(html, options?)`

Run the extraction engine on raw HTML: useful for saved pages or pipelines where you already have the bytes.

import { parseHtml } from "rdrr"

const result = await parseHtml(html, {
  url: "https://example.com/article",
})

`isProbablyReaderable(input)`

Lightweight pre-check: will this URL yield a meaningful article? Useful for routing in AI agents.

import { isProbablyReaderable } from "rdrr"

await isProbablyReaderable("https://example.com") // true | false

Also available as direct imports: parseWeb, parseYouTube, parseGitHub, parsePdf, detectUrlType, extractVideoId, normalizeUrl.

Supported sources

Type	What it handles
Webpages	Any HTML page with 20+ site-specific extractors
YouTube	Transcripts with chapters, speakers, timestamps
GitHub	Issues, PRs (with comments), raw files
PDFs	Any public `.pdf` (requires optional `unpdf`)
X/Twitter	Single posts and full profile timelines
llms.txt	Appended on demand via `--llms` or `includeLlmsTxt`

Community

Discussion, questions, site-extractor requests: GitHub Discussions
Bugs: GitHub Issues
Security: see SECURITY.md

Contributing

Contributions welcome! See CONTRIBUTING.md.

Want to add a site extractor? Check out src/extract/sites/: each one is a self-contained file.

License

MIT

rdrr

About rdrr

Platforms

Languages

Links

README.md

rdrr

Features

Install

Quick start

CLI

Library

CLI flags

Subcommands

API

`parse(url, options?)`

`parseHtml(html, options?)`

`isProbablyReaderable(input)`

Supported sources

Community

Contributing

License