LLM-Wiki
A local, LLM-maintained personal knowledge base. Drop documents in, watch an LLM compile them into a living, interlinked Obsidian wiki you can search and query.
Feel free to fork and don't forget to give it a Star ⭐️ for better reach!
Hello, I'm Nihar Shrotri, working as an AI Consultant. I'm currently pursuing my PhD in Artificial Intelligence and Machine Learning
Let's connect on LinkedIn for a Chat: https://www.linkedin.com/in/niharshrotri/
Built on the pattern Andrej Karpathy described in his LLM Wiki gist: instead of retrieving from raw documents at query time (classic RAG), an LLM incrementally compiles your sources into a structured, cross-linked markdown wiki that sits between you and the raw documents. The wiki is a persistent, compounding artifact — the cross-references are already there, the contradictions have already been flagged, the synthesis already reflects everything you've read.
You never write the wiki yourself. The LLM does all the grunt work: summarizing, cross-referencing, filing, bookkeeping. You bring the sources and ask the questions.
Runs 100% locally on Apple Silicon or anywhere Ollama works. No API keys, no cloud, no data leaving your machine.
What it does
# Drop files in (PDFs, markdown, HTML, DOCX, text)
wiki add ~/Documents/papers --recursive
# Watch Qwen3 read them and build an interlinked wiki
wiki ingest
# Ask questions — it searches the compiled wiki and cites its sources
wiki query "what's the main argument about X?"
# Health-check the knowledge base
wiki lint --fix
# Browse the whole thing in Obsidian (graph view, backlinks, everything)
open wiki/
Every ingest produces a cluster of sources/, entities/, and concepts/ pages with YAML frontmatter and [[wikilinks]] between them. Every query pulls the top-ranked pages via hybrid BM25 + vector + LLM-rerank search, then synthesizes a cited answer. Every lint run catches broken links, orphan pages, malformed frontmatter, and (optionally, using the LLM) contradictions between pages.
Features
Core capabilities
- Incremental ingest — drop a file, run
wiki ingest, get 8–15 cross-linked wiki pages - Structured extraction — Qwen3 identifies entities (people, orgs, models), concepts, and key takeaways per source
- Smart merging — re-ingesting related sources updates existing entity/concept pages instead of overwriting them, preserving provenance
- Hybrid search — BM25 full-text + vector embeddings + LLM reranking (all local, via QMD)
- 3-way query scope —
Wiki(thematic answers from LLM-compiled pages),Raw(exact lookups in original documents), orHybrid(both) - Intent classification — casual messages ("hi", "thanks") skip retrieval and get a quick reply, saving ~30 seconds per chitchat turn
- Cited synthesis — queries return markdown answers with
[[wikilinks]]pointing to the pages that support each claim - Write-back — save good answers as new
synthesis/pages with--save-as, so your explorations compound in the knowledge base - Wiki linting — automated health checks for broken links, orphans, malformed frontmatter, noise in sources, and (with
--deep) LLM-powered contradiction detection between pages - Auto-fix — most stylistic issues resolve with one command
- Auto-reindex — search index refreshes automatically after ingest and lint; new pages are queryable immediately
Web UI
A full web interface at http://127.0.0.1:8000 after wiki serve:
- Dashboard — project stats and recent activity
- Sources — list, inspect, delete, or re-ingest sources with one click
- Ingest — drag-and-drop upload, live progress log, persistent jobs that survive tab close and server restart
- Jobs — history of all ingest runs with live progress bars and error details
- Query — chat-style interface with streaming synthesis, scope toggle, save-as-synthesis button
- Lint — interactive lint report with one-click auto-fix
- Graph — D3 force-directed visualization of the full wiki, color-coded by page type
Supported input formats
.pdf · .md · .html · .docx · .txt
Obsidian integration
The wiki/ folder is a ready-made Obsidian vault with:
- Color-coded graph view (sources, entities, concepts, synthesis each get their own color)
- YAML frontmatter compatible with the Dataview plugin
- All cross-references as native
[[wikilinks]]so backlinks, outgoing-links, and graph traversal all work
Architecture
Three layers, per Karpathy:
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ raw/ │ → │ LLM Agent │ → │ wiki/ │
│ Your docs │ │ (Qwen3-14B) │ │ Markdown, │
│ (immutable) │ │ │ │ auto-linked │
└───────────────┘ └───────────────┘ └───────────────┘
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ schema/ │ │ Obsidian │
│ AGENTS.md │ │ graph view │
│ (the rules) │ │ + editing │
└───────────────┘ └───────────────┘
raw/— your source documents. Immutable. The agent reads but never modifies.wiki/— LLM-maintained markdown. One folder per page type (sources/,entities/,concepts/,synthesis/) plus auto-generatedindex.mdandlog.md. Open this in Obsidian.schema/AGENTS.md— the conventions file. Tells the LLM how to format pages, when to merge vs create, how to cite, how to handle contradictions. Edit as your preferences evolve..wiki/— internal state: SQLite ingest history, QMD search index, config. Git-ignored.
The ingest pipeline
Each source goes through three LLM passes:
- Extraction (thinking mode on) — Qwen3 reads the source and returns structured JSON: summary, key takeaways, named entities, concepts, tags.
- Page drafting (streaming, thinking mode off) — one call per entity/concept. Draft a new page from scratch, or merge new information into an existing page (preserving prior content, updating dates, appending to
sources:frontmatter). - Source summary — write the
sources/<slug>.mdpage listing every wiki page touched by this source for provenance.
After the three passes: index.md is rebuilt, log.md is appended, and QMD's search index is updated automatically.
The query pipeline
- Hybrid search via QMD — BM25 full-text + vector similarity + LLM reranker, all local
- Top-K page hydration — load full content of the top 5–8 hits
- Synthesis — Qwen3 writes a cited markdown answer using
[[wikilinks]]to reference the pages - (Optional) save-back —
--save-asfiles the answer as a newsynthesis/page
Stack
| Layer | Component | Why |
|---|---|---|
| LLM | Ollama + Qwen3-14B Q4_K_M | Strong reasoning, 40K context, thinking mode, 9.3GB on disk |
| Search | QMD (BM25 + vector + rerank) | All local, SQLite-backed, handles the heavy lifting |
| Embeddings | EmbeddingGemma-300M (via QMD) | Small footprint, high quality |
| Reranker | Qwen3-Reranker-0.6B (via QMD) | Fast cross-encoder rerank |
| CLI | Typer + Rich | Great UX, colored output, progress bars |
| Parsers | pypdf, python-docx, beautifulsoup4, lxml | Cover the main document formats |
| Vault | Obsidian | Best-in-class graph view and backlink UX — you don't have to build it |
No cloud services. No API keys. No data leaves your machine.
Requirements
- Python 3.11+
- Node.js 18+ (for QMD)
- Ollama with the
qwen3:14bmodel pulled (~9.3GB) - QMD (
npm install -g @tobilu/qmd) - Homebrew SQLite on macOS (
brew install sqlite) - ~15GB free disk space for models and embeddings
- ~12GB RAM recommended (16GB+ for comfort)
- Obsidian (optional but strongly recommended for browsing)
Tested on macOS (Apple Silicon, M3 Pro 18GB). Should work on Linux; Windows untested.
Installation
# Clone
git clone https://github.com/YOUR-USERNAME/llm-wiki.git
cd llm-wiki
# Create a virtual environment (uv is faster than pip, either works)
uv venv
source .venv/bin/activate
uv pip install -e .
# Pull the LLM (one-time, ~9.3GB)
ollama pull qwen3:14b
# Install QMD (the search backend)
npm install -g @tobilu/qmd
# Verify
wiki version
wiki --help
Quick start
# 1. Create a wiki in a folder of your choosing
mkdir my-wiki && cd my-wiki
wiki init
# 2. Drop some source documents in raw/, or use:
wiki add ~/Documents/papers --recursive
# 3. Run ingest (interactive by default — shows you entities/concepts
# before filing, with a y/n prompt per source)
wiki ingest
# First query triggers QMD to download its embedding + reranker models
# (~2GB, one-time). Subsequent queries are fast.
# 4. Ask questions
wiki query "what are the main themes across these documents?"
# 5. Save a good answer as a synthesis page
wiki query "compare X vs Y" --save-as x-vs-y-comparison
# 6. Health-check and auto-fix
wiki lint --fix
# 7. Browse the vault in Obsidian
open wiki/ # then "Open folder as vault"
Commands
| Command | Purpose |
|---|---|
wiki init [path] |
Scaffold a new wiki project |
wiki add <file-or-folder> [-r] |
Copy sources into raw/ and register for ingest |
wiki sources list |
List all tracked sources with status |
wiki sources show <id> |
Show metadata + text preview for one source |
wiki sources rm <id> |
Remove a source from tracking |
wiki ingest [source_id] |
Run the 3-pass LLM ingest pipeline |
wiki query "<question>" [--scope wiki\|raw\|hybrid] [--save-as <slug>] |
Search + synthesize a cited answer |
wiki reindex |
Force rebuild of the QMD search index |
wiki lint [--deep] [--fix] |
Health-check the wiki |
wiki status |
Show project stats, paths, config, backend health |
wiki serve [--port N] |
Launch the web UI at http://127.0.0.1:8000 |
Run wiki <command> --help for full options on any command. See USAGE.md for a full walkthrough.
Example output
A real ingest against notes.txt (28 words about Qwen3):
Source #1 raw/notes.txt
parsing…
extracting entities and concepts (thinking mode)…
Title: Quick Notes on Qwen
Slug: quick-notes-on-qwen
Summary:
Qwen is a family of large language models developed by Alibaba Cloud.
The latest version, Qwen3, introduces a thinking mode designed to enhance
performance on complex reasoning tasks.
Entities (3):
+ alibaba-cloud (organization) Alibaba Cloud
+ qwen (product) Qwen
+ qwen3 (product) Qwen3
Concepts (2):
+ large-language-models Large Language Models
+ thinking-mode-for-complex-reasoning Thinking Mode for Complex Reasoning
File these? Will create/update ~6 wiki pages. [Y/n]: Y
created entity alibaba-cloud
created entity qwen
created entity qwen3
created concept large-language-models
created concept thinking-mode-for-complex-reasoning
created source quick-notes-on-qwen
✓ Ingested Quick Notes on Qwen — 6 created, 0 updated
That's 6 cross-linked pages from a 28-word input, each with YAML frontmatter, [[wikilinks]] between them, and provenance back to the source. Open Obsidian's graph view and you'll see the cluster light up.
A real query against 11 ingested pages:
> wiki query "how does multi-head attention differ from self-attention?"
searching wiki (BM25 + vector + rerank)…
found 8 relevant page(s):
1. 0.93 concepts/multi-head-attention.md Multi-Head Attention
2. 0.55 concepts/self-attention-mechanism.md Self-Attention Mechanism
3. 0.40 entities/attention-is-all-you-need.md Attention is All You Need
...
synthesizing answer…
Multi-head attention and self-attention are related but distinct mechanisms:
1. **Scope and Parallelism**
- Self-attention is a single mechanism where each position in the input
computes attention weights based on all other positions
[[concepts/self-attention-mechanism]].
- Multi-head attention extends this by using multiple parallel attention
heads, allowing the model to focus on diverse patterns simultaneously
[[concepts/multi-head-attention]].
2. **Information Capture**
- Self-attention focuses on a single representation.
- Multi-head aggregates information from multiple heads, each capturing
different aspects (syntactic vs semantic, etc.)
[[concepts/multi-head-attention]].
[... etc.]
Every claim is cited. Every citation points to a page that actually exists.
Lint example
> wiki lint
╭─────────── Lint Report ────────────╮
│ Health score: 57/100 │
│ Pages checked: 12 │
│ │
│ 2 errors · 21 warnings · 0 infos │
╰────────────────────────────────────╯
──── Errors (2) ────
synthesis/transformers-and-llms.md
✗ broken_wikilink: Broken wikilink: [[entities/introduction-to-transformers]]
→ Either create the page or remove the link.
──── Warnings (21) ────
entities/qwen.md
! malformed_wikilink: 'sources/quick-notes-on-qwen.md' should be
'sources/quick-notes-on-qwen'
✓ auto-fixable
[... 20 more warnings ...]
> wiki lint --fix
✓ auto-fixed: 11
> wiki lint
╭─────────── Lint Report ───────────╮
│ Health score: 100/100 │
│ 0 errors · 0 warnings · 0 infos │
╰───────────────────────────────────╯
✓ No issues found. Your wiki is in good shape!
Project status
Current version: v0.8.1 — production-ready for personal use.
| Stage | Scope | Status |
|---|---|---|
| 1 | Scaffolding, CLI, Obsidian vault config | ✅ Done |
| 2 | Parsers (PDF, MD, HTML, DOCX, TXT), dedupe, wiki add |
✅ Done |
| 3 | LLM ingest pipeline (3 passes, streaming, merge-path) | ✅ Done |
| 4 | QMD search + wiki query with citation + save-back |
✅ Done |
| 5 | Lint checks + auto-fix + deep contradiction detection | ✅ Done |
| 6 | FastAPI + HTMX web UI (7 pages: Dashboard, Sources, Ingest, Jobs, Query, Lint, Graph) | ✅ Done |
| 7 (v0.7.0) | Source CRUD, intent classification, 3-way scope toggle | ✅ Done |
| 8 (v0.8.0) | Persistent ingest jobs (survive tab close, server restart) | ✅ Done |
| 8.1 | Auto-reindex after ingest and lint | ✅ Done |
Possible future work
- Hugging Face Spaces deployment (smaller model, API-compatible)
- Dashboard showing live active-job count
- Static HTML export for sharing the wiki
- Multi-user / team features
- Mobile-friendly web UI
- Fine-tuned query expansion model
- Confidence scoring per extracted claim
- OCR support for scanned PDFs
- EPUB support
Credits
- Andrej Karpathy — for the LLM-Wiki pattern described in this gist. This project is a direct implementation of the idea.
- QMD by Tobi Lütke — the hybrid search backend that does all the heavy lifting for query-time retrieval.
- Qwen3 by Alibaba Cloud — the local LLM doing the reading, writing, and synthesis.
- Ollama — the runtime that makes local LLM inference painless on Apple Silicon.
- Obsidian — saved me from writing my own graph view.
License
MIT — see LICENSE.
"The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass." — Karpathy