Home
Softono
llm-wiki-template

llm-wiki-template

Open source MIT Python
19
Stars
9
Forks
0
Issues
0
Watchers
1 month
Last Commit

About llm-wiki-template

AI-managed personal knowledge base using Karpathy LLM Wiki Pattern. Obsidian vault with automated workflows for ingestion, compilation, Q&A, and quality auditing.

Platforms

Web Self-hosted

Languages

Python

Links

🧠 LLM Wiki Template

An AI-managed personal knowledge base using the Karpathy LLM Wiki Pattern β€” where LLMs write and maintain a structured Obsidian wiki from your raw research data.

Obsidian License: MIT

🌐 Language: English | TiαΊΏng Việt

πŸ“Ί Video HΖ°α»›ng DαΊ«n

Bα»™ Nhα»› AI Kiểu Karpathy

Bα»™ Nhα»› AI Kiểu Karpathy: 3 BΖ°α»›c XΓ’y Wiki Cho Agent [Miα»…n PhΓ­] β€” HΖ°α»›ng dαΊ«n chi tiαΊΏt tα»« setup Δ‘αΊΏn sα»­ dα»₯ng thα»±c tαΊΏ.


What Is This?

This is a ready-to-use template for building an AI-powered personal knowledge base, inspired by Andrej Karpathy's approach to using LLMs for knowledge management:

"Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest." β€” Andrej Karpathy

Instead of relying on complex RAG pipelines or vector databases, this system uses a simpler approach:

  1. You dump raw sources (articles, tweets, papers, videos) into raw/
  2. The LLM compiles structured wiki articles in wiki/
  3. You ask questions and get answers grounded in your personal knowledge base
  4. Knowledge compounds β€” each cycle makes the wiki richer

The result is a 100% inspectable, file-based knowledge system where you can see exactly what your AI "knows."

Key Features

  • πŸ“‚ File-based architecture β€” Markdown files, no databases, no vendor lock-in
  • πŸ” 100% inspectable β€” Every piece of knowledge is a readable .md file
  • πŸ”„ 10 automated workflows β€” /ingest, /compile, /ask, /cleanup, /breakdown, /autoresearch, /save, /overview, /startup, /wrapup
  • πŸ”¬ Autonomous research β€” Agent searches the web, evaluates sources, and ingests automatically
  • βš–οΈ Contradiction detection β€” Flags conflicting claims instead of silently overwriting
  • πŸ’Ύ Chat-to-Wiki pipeline β€” Save knowledge from conversations directly to wiki
  • πŸ”Œ Integrated MCP Server β€” Standard MCP API (Model Context Protocol) with FTS5 search for AI agents (Secure 127.0.0.1 bind)
  • πŸ•ΈοΈ Knowledge Graph β€” Automated analysis and visualization of knowledge links (God nodes, Orphans)
  • πŸ“Š Self-maintaining indexes β€” Master index, glossary, backlinks, executive overview, operations log
  • πŸ›‘οΈ Quality gates β€” Article size guardrails, anti-cramming/thinning rules, re-read checks
  • 🧹 Wiki health checks β€” Automated tone, structure, link, and contradiction auditing
  • πŸ“ˆ Compound knowledge loop β€” Each cycle produces better knowledge, which produces better outputs

Quick Start

1. Use This Template

Click "Use this template" β†’ "Create a new repository" on GitHub.

Or clone manually:

git clone https://github.com/YOUR_USERNAME/llm-wiki-template.git my-second-brain

2. Open in Obsidian

  1. Download Obsidian (free)
  2. Open as vault: File β†’ Open vault β†’ Open folder as vault
  3. Select the cloned directory
  4. Install recommended plugins when prompted: Dataview, Marp Slides

3. Connect Your AI Agent

This template works with any LLM-powered coding agent that can read files. Tested with:

  • Gemini CLI (recommended)
  • Claude Code / Claude Desktop with filesystem access
  • Cursor / Windsurf with workspace access
  • Any agent that can read/write Markdown files

The agent reads AGENTS.md as its operating manual β€” no additional configuration needed.

4. Start Building Your Knowledge Base

# Step 1: Ingest a source
/ingest https://example.com/interesting-article

# Step 2: Compile into wiki
/compile

# Step 3: Ask questions
/ask What are the key concepts from my sources?

# Step 4: Audit wiki quality
/cleanup

# Step 5: Find knowledge gaps
/breakdown

# Step 6: Auto-research a topic
/autoresearch Large Language Models

# Step 7: Save chat insights to wiki
/save

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 YOUR RESEARCH                    β”‚
β”‚  Articles, Tweets, Papers, Videos, Repos, etc.   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ /ingest
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  raw/                            β”‚
β”‚  Source documents β€” NEVER modified, only added    β”‚
β”‚  articles/ papers/ repos/ tweets/ videos/ misc/  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ /compile              β”‚ /autoresearch
           β–Ό                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  wiki/                           β”‚
β”‚  Compiled knowledge β€” AI-maintained wiki          β”‚
β”‚  concepts/ tools/ people/ comparisons/           β”‚
β”‚  + _index.md, _glossary.md, overview.md          β”‚
β”‚  βš–οΈ Contradiction Check before every update      β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚ /ask    β”‚ /cleanup β”‚ /save
       β–Ό        β–Ό          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ answers  β”‚ β”‚ quality  β”‚ β”‚ chat β†’ raw β†’ wiki     β”‚
β”‚ + refs   β”‚ β”‚ fixes    β”‚ β”‚ knowledge extraction  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Workflows

Command What It Does
/ingest Imports raw sources (URLs, files, PDFs) into raw/ with proper frontmatter
/compile Reads raw sources and creates/updates structured wiki articles (with contradiction detection)
/ask Answers questions using wiki knowledge, with optional file-back to wiki
/cleanup Audits wiki quality β€” tone, structure, links, size, contradiction backlog β€” and auto-fixes
/breakdown Scans wiki for missing entities and proposes new articles
/autoresearch πŸ†• Autonomous research β€” searches the web, evaluates sources, ingests, and synthesizes reports
/save πŸ†• Chat-to-Wiki β€” extracts knowledge from conversations and saves directly to wiki
/startup πŸ†• Project Brain Startup β€” AI recalls context from previous sessions
/wrapup πŸ†• Project Wrapup β€” AI saves session archive and updates rolling context

Each workflow is defined in .agents/workflows/ and can be customized.

AutoResearch β€” Autonomous Knowledge Discovery

The /autoresearch workflow turns your wiki into an active researcher:

/autoresearch [topic]

How it works:

  1. Gap Analysis β€” Scans existing wiki to identify what's missing
  2. 3-Round Research Loop β€” Broad search β†’ Gap fill β†’ Verify
  3. Auto-Ingest β€” Downloads and processes sources automatically
  4. Synthesis Report β€” Generates an executive summary at outputs/reports/
  5. Human Review β€” You approve before anything enters the wiki

Configure search constraints in raw/_research_program.md.

Contradiction Detection

When compiling new sources, the system automatically checks for conflicting claims:

  • βœ… Temporal updates (v1.0 β†’ v2.0) β€” Updated normally
  • βœ… New information β€” Integrated normally
  • ⚠️ Actual contradictions β€” Preserved with [!warning] callout, tagged needs-review

The wiki never silently overwrites conflicting information. Human review is always required.

Quality System

The template enforces several quality mechanisms:

  • Re-read before update β€” The AI must read the full article before editing (non-negotiable)
  • Contradiction check β€” Compare new claims against existing wiki before writing
  • Article size guardrails β€” 15–120 lines; too short = stub, too long = split
  • Anti-cramming β€” Sub-topics with β‰₯3 paragraphs get their own article
  • Anti-thinning β€” No article creation unless β‰₯3 meaningful sentences can be written
  • Encyclopedia tone β€” Neutral, attribution-based writing, no editorial voice
  • Absorption log β€” Tracks which raw sources have been compiled (no duplicates)
  • Operations log β€” Chronological record of every action taken on the vault

File Structure

llm-wiki-template/
β”œβ”€β”€ AGENTS.md                ← Agent operating manual (the brain)
β”œβ”€β”€ README.md                ← This file
β”œβ”€β”€ update.py                ← πŸ†• One-command updater script
β”œβ”€β”€ sync-brain.ps1           ← πŸ†• Auto-sync to GitHub script
β”œβ”€β”€ .gitignore
β”‚
β”œβ”€β”€ .agents/workflows/       ← 10 automated workflows
β”‚   β”œβ”€β”€ ask.md
β”‚   β”œβ”€β”€ autoresearch.md      ← πŸ†• Autonomous research
β”‚   β”œβ”€β”€ breakdown.md
β”‚   β”œβ”€β”€ cleanup.md           ← Updated: contradiction backlog scanning
β”‚   β”œβ”€β”€ compile.md           ← Updated: contradiction detection (Step 4.5)
β”‚   β”œβ”€β”€ ingest.md
β”‚   β”œβ”€β”€ save.md              ← πŸ†• Chat-to-Wiki pipeline
β”‚   β”œβ”€β”€ startup.md           ← πŸ†• Session startup
β”‚   └── wrapup.md            ← πŸ†• Session wrapup
β”‚
β”œβ”€β”€ integrations/mcp/        ← πŸ†• MCP Server Integration
β”‚   β”œβ”€β”€ README.md            ← MCP Setup Guide
β”‚   └── config-sample.json   ← Sample config for Claude/Cursor
β”‚
β”œβ”€β”€ scripts/                 ← πŸ†• Agent-Native Tooling
β”‚   β”œβ”€β”€ brain.py             ← Central CLI Router (Search, Index, Health, MCP)
β”‚   β”œβ”€β”€ brain_mcp.py         ← FastMCP Server (Secure 127.0.0.1 bind)
β”‚   β”œβ”€β”€ brain_db.py          ← SQLite Database abstraction
β”‚   β”œβ”€β”€ build_search_index.py← FTS5 Indexing
β”‚   └── ...                  ← Other tools (audit, resolve_orphans, etc.)
β”‚
β”œβ”€β”€ .obsidian/               ← Obsidian config (pre-configured)
β”‚
β”œβ”€β”€ sessions/                ← Session logs (AI Memory)
β”‚   β”œβ”€β”€ current-context.md   ← Rolling context (auto-updated)
β”‚   β”œβ”€β”€ .hot-buffer.md       ← Mid-session decisions buffer
β”‚   └── session-summary-*.md ← Archive of each session
β”‚
β”œβ”€β”€ raw/                     ← Your source documents
β”‚   β”œβ”€β”€ _ingest.py           ← Batch ingest script (Python)
β”‚   β”œβ”€β”€ _research_program.md ← πŸ†• AutoResearch configuration
β”‚   β”œβ”€β”€ articles/
β”‚   β”œβ”€β”€ papers/
β”‚   β”œβ”€β”€ repos/
β”‚   β”œβ”€β”€ tweets/
β”‚   β”œβ”€β”€ videos/
β”‚   └── misc/
β”‚
β”œβ”€β”€ wiki/                    ← AI-maintained wiki
β”‚   β”œβ”€β”€ overview.md          ← πŸ†• Executive summary for cross-project access
β”‚   β”œβ”€β”€ _index.md            ← Master catalog
β”‚   β”œβ”€β”€ _glossary.md         ← Term definitions
β”‚   β”œβ”€β”€ _absorb_log.json     ← Compilation tracker
β”‚   β”œβ”€β”€ _backlinks.json      ← Reverse link index
β”‚   β”œβ”€β”€ _build_backlinks.py  ← Backlinks builder script
β”‚   β”œβ”€β”€ _build_graph.py      ← πŸ†• Knowledge Graph analysis script
β”‚   β”œβ”€β”€ _dashboard.md        ← Dataview dashboard
β”‚   β”œβ”€β”€ _ops_log.md          ← Operations log
β”‚   β”œβ”€β”€ concepts/
β”‚   β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ people/
β”‚   └── comparisons/
β”‚
└── outputs/                 ← Generated content
    β”œβ”€β”€ reports/             ← AutoResearch synthesis reports
    β”œβ”€β”€ slides/
    β”œβ”€β”€ charts/
    └── summaries/

Customization

Change the Language

The template uses English by default. To switch:

  1. Edit AGENTS.md β†’ update the Writing Tone section
  2. Update wiki meta files (_index.md, _glossary.md) headers
  3. The AI will follow your language preference from AGENTS.md

Add Entity Types

Edit AGENTS.md β†’ Entity-Type Templates section to add new categories beyond concepts/tools/people/comparisons.

Modify Quality Rules

All quality rules are in AGENTS.md. Adjust thresholds (article size, quote density, etc.) to match your preferences.

Configure AutoResearch

Edit raw/_research_program.md to customize:

  • Search scope and constraints
  • Confidence scoring thresholds
  • Source exclusion lists
  • Domain-specific notes and priorities

Add Obsidian Plugins

The template includes configs for Dataview (tables/queries) and Marp Slides (presentations). Add more plugins through Obsidian's community plugin browser.

Batch Ingest Script

For bulk importing, use the included Python script:

# Single file
python raw/_ingest.py path/to/article.md

# PDF (requires PyMuPDF: pip install PyMuPDF)
python raw/_ingest.py paper.pdf

# Entire folder
python raw/_ingest.py ~/Downloads/research-notes/

# Preview without creating files
python raw/_ingest.py big-folder/ --dry-run

Updating Your Vault

Already using the template? Update to the latest workflows with a single command:

Option A: brain CLI (Recommended)

Install once:

pip install git+https://github.com/KHOAAI-HILL/llm-wiki-template.git

Then use anywhere inside your vault:

brain update              # Update (asks for confirmation)
brain update --dry-run    # Preview changes without writing
brain update --force      # Update without asking
brain status              # Check vault health
brain version             # Show version

Option B: Standalone script (No install)

# Download the script
curl -o update.py https://raw.githubusercontent.com/KHOAAI-HILL/llm-wiki-template/master/update.py

# Run it
python update.py

Both methods only touch system files (workflows, AGENTS.md). Your personal data (raw/, wiki/, sessions/, outputs/) is never modified.

Philosophy

This template is built on three principles:

  1. Files over databases β€” Markdown files are portable, inspectable, and version-controllable. No vector DB, no cloud dependencies.

  2. Compile once, query forever β€” Instead of retrieving raw chunks on every query (RAG), the AI pre-compiles clean wiki articles. Queries read refined knowledge, not raw data.

  3. Knowledge compounds β€” Each ingest-compile-ask cycle makes the wiki richer. Better wiki β†’ better answers β†’ better questions β†’ richer wiki.

Credits

  • Andrej Karpathy β€” Originated the LLM Knowledge Base concept
  • Farzaa β€” wiki-gen-skill implementation that heavily influenced quality gates
  • DataChaz β€” Community breakdown and analysis

License

MIT β€” Use freely, modify as you wish, share with others.