Trove AI — 拾遗
Read-later + AI knowledge base, built for the Chinese internet.
Why Trove AI?
You save 1000 articles. You re-read 5.
The problem isn't that you have too much — it's that your tools treat "save" and "read" as the same action. WeChat 收藏 buries them. 收藏 ≠ reading. That gap is the entire problem.
Pocket shut down in 2024. Omnivore shut down too. Their users' carefully curated libraries vanished overnight.
Trove AI is a self-hostable, AI-powered second brain that turns "save for later" back into "actually read & remember." Built first-class for the Chinese internet (WeChat 公众号 / 知乎 / 抖音 / 小红书 / B 站 / 头条 / 掘金 / CSDN), with WeChat Bot ingress, automatic knowledge graph, and one-way Obsidian sync as built-in features.
It's yours to host. It's yours to keep.
Highlights
📥 Multi-platform captureWeChat 公众号 · 视频号 (WeChat Channels) · 头条 · 抖音 · 小红书 · B 站 · Medium · CSDN · 掘金 — and any OpenGraph-aware URL. JS-rendered & no-parser pages (视频号, CSDN, Medium, …) are extracted via a trafilatura → headless-Chromium render → BeautifulSoup cascade for clean main-content. Ingestion via: browser bookmark, WeChat Bot, paste, upload (PDF/DOCX/EPUB/etc), one-sentence Spark generation. |
🧠 AI does the work, not youEvery article gets: AI-extracted title, 5-sentence summary, 3-5 key points, auto-tags, source-aware author extraction, 1024-dim vector embedding, mind-map auto-generation, video transcription. |
🔍 Semantic search + RAG Q&AAsk "What did I read about prompt engineering?" → get answer with citations to your library, not the public internet. |
🕸 Auto-grown knowledge graphEach new article finds its 3 closest siblings by semantic distance. Watch your knowledge connect itself. |
🛤 Learning pathsOne sentence → AI picks articles from your library, orders them, presents as a curriculum. |
💬 WeChat Bot ingressForward an article URL to your bot → it's in your library 5 seconds later, with summary, tags, and "related to your earlier reads" suggestions. |
📝 One-way Obsidian syncCompanion plugin writes a Markdown snapshot to your vault. Never overwrites your local edits. Your data survives any future Trove shutdown. |
🏢 Multi-tenant, production-gradeJWT auth · per-user data isolation · revocable sync tokens · service-token impersonation for bots · admin user management. |
📂 Real knowledge-base craftsmanshipFolder hierarchy · tag system · archive · favorite · recycle bin · weekly review reminder · related-articles recommender on every read view. |
🌐 All content typesWeb links · clipboard paste · PDF · Word · Excel · PPT · EPUB · CSV · plain notes · Spark (1-sentence → full article AI generation). |
📱 Full-device responsivePC · iPad · mobile all work natively. Touch-optimized reader, gesture-friendly knowledge graph, mobile-first layouts. Use Trove from any device, anywhere. |
🌗 Light / dark / system themeAuto-switching based on OS preference, or pin to your favorite mode. Eye-friendly serif reader font for long sessions. |
Screenshots
⚠️ Add screenshots in
docs/screenshots/— see open issues for placeholders.
| Dashboard | Reader | Knowledge graph | Settings |
|---|---|---|---|
| (placeholder) | (placeholder) | (placeholder) | (placeholder) |
Who is it for?
- Product managers and researchers drowning in saved-but-never-read articles
- Engineers and lifelong learners who want their weekly tabs to compound into knowledge
- Privacy-conscious users who don't want their reading habits living in some startup's database
- People burned by Pocket / Omnivore shutdowns wanting data sovereignty
- Content curators building structured personal knowledge bases
- Self-hosters who run their own infrastructure for fun and survival
- Cross-device readers who switch between phone (commute) → iPad (couch) → laptop (desk) and want all three to feel native
Compared to alternatives
| Trove AI | Omnivore | Readwise | Hoarder/Karakeep | Memos | ||
|---|---|---|---|---|---|---|
| Open source | ✅ AGPL-3.0 | ❌ | ✅ (was) | ❌ | ✅ MIT | ✅ MIT |
| Self-host | ✅ Docker | ❌ | ✅ (defunct) | ❌ | ✅ Docker | ✅ Docker |
| Chinese platforms | ✅ 6+ deep parsers | ❌ | Weak | ❌ | Weak | N/A |
| AI summary | ✅ Any provider | ❌ Basic | ✅ | ✅ | ✅ | ❌ |
| Knowledge graph | ✅ Auto | ❌ | ❌ | ❌ | ❌ | ❌ |
| Learning paths | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| WeChat Bot | ✅ Built-in | ❌ | ❌ | ❌ | ❌ | ❌ |
| Obsidian sync | ✅ Plugin | ❌ | ✅ | ✅ Paid | ❌ | ❌ |
| Responsive (PC/pad/mobile) | ✅ All native | ✅ | Limited | ✅ | Limited | Limited |
| Multi-tenant | ✅ | N/A | ✅ | N/A | Limited | Limited |
| Status | ✅ Active | ⛔ Shut down 2024 | ⛔ Shut down 2024 | ✅ Paid | ✅ Active | ✅ Active |
Trove AI is the only option that combines: deep Chinese platform support · WeChat Bot · auto knowledge graph · Obsidian sync · self-host · full-device responsive UI. If any one of those is critical to you, the alternatives don't cover it.
Quick Start (5 minutes)
Prerequisites
- Docker ≥ 24.0 with Compose v2 (
docker compose ..., notdocker-compose) - ~ 4 GB RAM free
- ~ 5 GB disk
That's it. No Python or Node required on the host.
Steps
# 1. Clone
git clone https://github.com/weaiw/trove-ai.git
cd trove-ai
# 2. Configure secrets
cp .env.example .env
# Edit .env, at minimum:
# POSTGRES_PASSWORD=$(openssl rand -base64 24)
# SECRET_KEY=$(openssl rand -base64 48)
# 3. (Optional) Pre-fill LLM keys (or skip — configure via web UI later)
cp backend/app/config_store.example.json backend/app/config_store.json
# 4. Run
docker compose up -d
# 5. Open
open http://localhost
First-time setup creates an admin user. The credentials appear in backend logs:
docker compose logs backend | grep -i admin
Full self-host guide with troubleshooting: docs/SELF_HOST.md.
Cloud deployment
Any Docker-capable VM works. Battle-tested on:
- 腾讯云 Lighthouse / CVM (recommended for China users)
- AWS EC2 t3.medium
- DigitalOcean 4GB droplet
- Hetzner CX22
Bring your own reverse proxy (Caddy / Traefik / Nginx) for HTTPS, or use Cloudflare Tunnel.
Architecture
┌──────────────────────────────────────────────────────┐
│ Any device — PC · iPad · mobile · browser │
│ • Web app • WeChat Bot • Obsidian plugin │
└─────────────────────────┬────────────────────────────┘
│
┌───────▼───────┐
│ Nginx :80 │
└───┬────────┬──┘
│ │
┌───────────▼──┐ ┌──▼────────────┐
│ Frontend │ │ Backend │
│ (Next.js 14)│ │ (FastAPI) │
│ Responsive │ │ async │
└──────────────┘ └───┬────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
┌─────────▼──────────┐ ┌────────▼───────┐ ┌──────────▼────────┐
│ PostgreSQL 16 │ │ Redis 7 │ │ External APIs │
│ + pgvector │ │ (cache) │ │ LLM + embedding │
│ • articles │ │ │ │ • DeepSeek │
│ • embeddings 1024d│ │ │ │ • 讯飞 / OpenAI │
│ • knowledge_edges │ │ │ │ • SiliconFlow │
│ • users + tokens │ │ │ │ • any compatible │
└────────────────────┘ └────────────────┘ └───────────────────┘
Tech stack
| Layer | Choice | Why |
|---|---|---|
| Frontend | Next.js 14 + TypeScript + Tailwind | App router, server components, responsive-first |
| Backend | FastAPI + SQLAlchemy async + pydantic | Async-native, type-safe, auto OpenAPI docs |
| Database | PostgreSQL 16 + pgvector | One DB for both relational data and vector search |
| Cache | Redis 7 | Sessions, queues |
| Crawler | Playwright + curl_cffi + httpx | Defeats Chinese anti-bot (TLS fingerprint, XHR intercept, JS VM bypass) |
| LLM | Any OpenAI-compatible | DeepSeek, 讯飞星辰, OpenAI, SiliconFlow, MiniMax, 智谱, ... |
| Embedding | SiliconFlow bge-m3 (1024-dim) or local fastembed (384-dim) | Cloud quality with local fallback |
| Reverse proxy | Nginx | Single ingress, fast static serving |
Configuration
Everything user-facing is configurable via the web UI:
Settings page → AI 对话模型 / 嵌入模型 / 缓存
| What | Where |
|---|---|
| LLM provider + key + model | Settings → AI 对话模型 |
| Embedding provider + key + model | Settings → 嵌入模型 |
| Cache clearing / rebuilding | Settings → 系统缓存 |
| Obsidian sync token | Personal Settings → Obsidian 备份 |
| WeChat Bot binding | Personal Settings → WeChat |
| Review schedule | Personal Settings → 周期回顾 |
Environment variables
| Var | Required | What |
|---|---|---|
POSTGRES_PASSWORD |
✅ | DB password |
SECRET_KEY |
✅ | JWT signing secret (≥ 32 random chars) |
OPENAI_API_KEY |
❌ | Optional fallback if no UI config |
DEEPSEEK_API_KEY |
❌ | Optional fallback |
SILICONFLOW_API_KEY |
❌ | Optional fallback |
MINIMAX_API_KEY |
❌ | Optional fallback |
SERVICE_TOKENS |
❌ | tokenA:userA,tokenB:userB — for bots |
LINKMIND_PUBLIC_BASE |
❌ | Public URL for bot deep links |
See .env.example for the complete template with comments.
API endpoints (high-level)
| Endpoint | Purpose |
|---|---|
POST /api/auth/login |
User login → JWT |
POST /api/articles |
Add article by URL |
POST /api/articles/upload |
Upload file (PDF / Word / EPUB / etc) |
POST /api/articles/notes |
Write a note |
POST /api/articles/spark |
One-sentence → AI-generated article |
POST /api/assistant/ask |
RAG Q&A on your library |
GET /api/knowledge/graph |
Knowledge graph data |
POST /api/learning/paths/generate |
Generate learning path |
POST /api/sync/issue-token |
Mint long-lived Obsidian sync token |
GET /api/sync/articles |
Paginated articles for sync |
POST /api/sync/revoke-all-tokens |
Revoke all sync tokens |
Full OpenAPI spec at http://localhost/api/docs once running.
Obsidian Sync — companion plugin
Plugin repo: weaiw/trove-sync-obsidian (MIT)
One-shot snapshot to your local vault. Never overwrites your edits. Your data survives any future shutdown.
Setup:
- Web app → Personal Settings → Obsidian Backup → Generate Sync Token
- Download plugin from Releases
- Drop into
<your-vault>/.obsidian/plugins/trove-sync/ - In Obsidian → Community plugins → enable Trove AI Sync
- Paste token + server URL → click Sync Now
The plugin auto-detects already-synced articles via dual-OR (sync_state.json ∪ frontmatter scan), so it's safe to lose either side of the index.
Documentation
docs/SELF_HOST.md— Full self-host guide with troubleshootingCONTRIBUTING.md— How to contribute- API docs at
/api/docs(auto-generated from FastAPI)
Roadmap
v1.1 — current
- ✅ WeChat Channels (视频号) capture
- ✅ Smart generic extraction (trafilatura → headless render → BeautifulSoup)
- ✅ Article-scoped Q&A (📄 this-article / 📚 whole-library toggle)
v1.0
- ✅ Multi-platform capture (8+ sources)
- ✅ AI processing pipeline (summary / key-points / tags / embedding / mind-map)
- ✅ RAG Q&A + semantic search
- ✅ Auto knowledge graph + learning paths
- ✅ WeChat Bot ingress
- ✅ Obsidian sync plugin
- ✅ Multi-tenant + revocable sync tokens
- ✅ Self-host via Docker
- ✅ Responsive UI for PC / pad / mobile
v1.1
- 🔜 Browser extension (one-click clip from any tab)
- 🔜 Image local download (offline-safe backup)
- 🔜 Pocket / Omnivore import
- 🔜 Better article deduplication
- 🔜 PWA support for "add to home screen" on mobile
v1.2
- More LLM providers (Claude, Gemini, Doubao native)
- Per-user theme & language preferences
- Bulk re-process articles with new AI prompts
- Article version history
v2 — research
- Obsidian community marketplace submission
- Multi-vault Obsidian sync
- Notion / Logseq / Reflect export
- Audio podcast generation from saved articles
- Daily / weekly digest emails
FAQ
Will Trove AI work without paying for an LLM API?
Yes — embedding has a local CPU-only fallback (BAAI/bge-small-en-v1.5, 384-dim).
For LLM features (summary, tags, RAG), you need at least a free-tier API:
- DeepSeek — cheapest at ~$0.27 / 1M tokens
- 讯飞 / 智谱 — both offer free trial credits
- OpenAI / Claude / Gemini — pay-per-use
- MiniMax / SiliconFlow — generous free tiers
How much does it cost to run?
~ $5-10/month on a small VPS + LLM API usage. At < 1000 articles/month with DeepSeek, expect ~$2-5/month in LLM costs. For totally free, use local embedding + skip AI summary features.
Can I migrate from Pocket / Omnivore / Readwise?
Direct importer coming in v1.1. Workarounds:
- Pocket export → individual URL list → bulk paste via
/api/articles/batch - Omnivore → markdown export → use
/api/articles/upload
Is anything sent to third parties without my consent?
Only to the LLM provider you explicitly configure. The API key and base URL are entirely under your control. For air-gapped operation, use local embedding only and skip LLM-powered features. No analytics, no telemetry, no third-party JS in the frontend.
Does it work well on mobile?
Yes — built mobile-first with responsive layouts. Reader, library, search, knowledge graph all touch-optimized. v1.1 adds PWA so you can "add to home screen" on iOS / Android.
Do I need to know coding to deploy?
Basic Docker familiarity helps. docs/SELF_HOST.md walks through every step.
If you're stuck, open an issue and the community usually responds within a day.
How is data isolated between users?
Every row in articles, tags, folders, knowledge_edges, learning_paths, wechat_accounts is tagged with user_id.
All queries filter on current_user.id. JWT auth + per-user revocable sync tokens.
Cross-tenant leaks are mechanically prevented at the ORM layer.
What happens if I delete an article?
It goes to a per-user recycle bin (deleted_at column). Auto-purge after 30 days, restorable before that.
The Obsidian plugin never propagates deletes — your local file stays unless you manually delete it.
Can I run this commercially?
Yes, under AGPL-3.0: you can charge users for hosting, as long as you publish your modifications to those users. For closed-source SaaS deployment, contact the maintainer for a commercial license.
Contributing
See CONTRIBUTING.md.
Especially welcome:
- New platform parsers (parser_service.py)
- Translations (English, 日本語, others)
- UI polish and accessibility
- Bug reports with reproduction
- Comparison tests with other LLM providers
Acknowledgements
- Built with Hermes AI coding agent + DeepSeek as the LLM brain — over 2.7 billion tokens spent in vibe-coding, 0 lines of human-written code
- Backend: FastAPI · SQLAlchemy · pgvector · Playwright · curl_cffi
- Frontend: Next.js · Tailwind · lucide-react · react-flow
- Embedding: BAAI/bge-m3 · fastembed
- Inspired by Pocket, Omnivore, Readwise — and frustrated by the first two shutting down
License
Core: AGPL-3.0. Obsidian plugin: MIT.
For commercial closed-source SaaS deployment, contact the maintainer for a separate commercial license.
If Trove AI saves your knowledge from yet another startup shutdown, please drop a ⭐ — it costs you nothing and helps the project be discovered.