yantrikdb
# YantrikDB — A Cognitive Memory Engine for Persistent AI Systems > The memory engine for AI that actually knows you. [](https://pypi.org/project/yantrikdb/) [](https://crates.io/crates/yantrikdb) [](LICENSE) ## Get Started in 60 Seconds ### For AI agents (MCP — works with Claude, Cursor, Windsurf, Copilot) ```bash pip install yantrikdb-mcp ``` Add to your MCP client config: ```json { "mcpServers": { "yantrikdb": { "command": "yantrikdb-mcp" } } } ``` That's it. The agent auto-recalls context, auto-remembers decisions, and auto-detects contradictions — no prompting needed. See [yantrikdb-mcp](https://github.com/yantrikos/yantrikdb-mcp) for full docs. ### As a Python library ```bash pip install yantrikdb ``` The engine ships a default embedder (`potion-base-2M`, ~7 MB, distilled from BGE-base-en-v1.5) — `record_text()` / `recall_text()` work out of the box. **No `sentence-transformers` install. No first-run model download. No ONNX runtime.** Just one `pip install`. ```python import yantrikdb # Default: bundled embedder, dim=64. Just works. db = yantrikdb.YantrikDB.with_default("memory.db") db.record("Alice is the engineering lead", importance=0.8, domain="people") db.record("Project deadline is March 30", importance=0.9, domain="work") db.record("User prefers dark mode", importance=0.6, domain="preference") results = db.recall("who leads the team?", top_k=3) # → [{"text": "Alice is the engineering lead", "score": 1.0}, ...] db.relate("Alice", "Engineering", "leads") db.get_edges("Alice") db.think() # consolidate, detect conflicts, mine patterns db.close() ``` #### Want higher-quality embeddings? Three opt-in upgrade paths, in increasing weight: ```python # 1. Larger bundled variant — downloads on first call, caches under # your user data dir. Self-hosted from yantrikos/yantrikdb-models; # no HuggingFace dependency, no rate limits. db = yantrikdb.YantrikDB("memory.db", embedding_dim=256) db.set_embedder_named("potion-base-8M") # ~28 MB, ~92% MiniLM # or: db.set_embedder_named("potion-base-32M") # ~121 MB, ~95% MiniLM # 2. Bring your own embedder (sentence-transformers, fastembed, custom). from sentence_transformers import SentenceTransformer db = yantrikdb.YantrikDB("memory.db", embedding_dim=384) db.set_embedder(SentenceTransformer("all-MiniLM-L6-v2")) # 3. Slim build (no bundled embedder, must set_embedder yourself). # For deployments where the ~7 MB bundle is intolerable. # Rust: yantrikdb = { version = "0.7", default-features = false } ``` | Path | Quality vs MiniLM | Size on disk | Install network | |---|---|---|---| | Bundled default (`with_default`) | ~89% | ~7 MB (bundled) | none | | `set_embedder_named("potion-base-8M")` | ~92% | ~28 MB (cached) | first call only | | `set_embedder_named("potion-base-32M")` | ~95% | ~121 MB (cached) | first call only | | `set_embedder(MiniLM)` | 100% (baseline) | ~80 MB | sentence-transformers' own download | ### As a Rust crate ```toml [dependencies] yantrikdb = "0.7" # Want set_embedder_named() for runtime model upgrades? # yantrikdb = { version = "0.7", features = ["embedder-download"] } # Slim build (no bundled embedder, no network code path): # yantrikdb = { version = "0.7", default-features = false } ``` ## The Problem Current AI memory is: > Store everything → Embed → Retrieve top-k → Inject into context → Hope it helps. That's not memory. That's a search engine with extra steps. Real memory is hierarchical, compressed, contextual, self-updating, emotionally weighted, time-aware, and predictive. YantrikDB is built for that. ## Why Not Existing Solutions? | Solution | What it does | What it lacks | |----------|-------------|---------------| | **Vector DBs** (Pinecone, Weaviate) | Nearest-neighbor lookup | No decay, no causality, no self-organization | | **Knowledge Graphs** (Neo4j) | Structured relations | Poor for fuzzy memory, not adaptive | | **Memory Frameworks** (LangChain, Mem0) | Retrieval wrappers | Not a memory architecture — just middleware | | **File-based** (CLAUDE.md, memory files) | Dump everything into context | O(n) token cost, no relevance filtering | ### Benchmark: Selective Recall vs. File-Based Memory | Memories | File-Based | YantrikDB | Token Savings | Precision | |----------|-----------|-----------|---------------|-----------| | 100 | 1,770 tokens | 69 tokens | **96%** | 66% | | 500 | 9,807 tokens | 72 tokens | **99.3%** | 77% | | 1,000 | 19,988 tokens | 72 tokens | **99.6%** | 84% | | 5,000 | 101,739 tokens | 53 tokens | **99.9%** | 88% | At 500 memories, file-based exceeds 32K context windows. At 5,000, it doesn't fit in any context window — not even 200K. YantrikDB stays at ~70 tokens per query. Precision *improves* with more data — the opposite of context stuffing. ## Architecture ### Design Principles - **Embedded, not client-server** — single file, no server process (like SQLite) - **Local-first, sync-native** — works offline, syncs when connected - **Cognitive operations, not SQL** — `record()`, `recall()`, `relate()`, not `SELECT` - **Living system, not passive store** — does work between conversations - **Thread-safe** — `Send + Sync` with internal Mutex/RwLock, safe for concurrent access ### Five Indexes, One Engine ``` ┌──────────────────────────────────────────────────────┐ │ YantrikDB Engine │ │ │ │ ┌──────────┬──────────┬──────────┬──────────┐ │ │ │ Vector │ Graph │ Temporal │ Decay │ │ │ │ (HNSW) │(Entities)│ (Events) │ (Heap) │ │ │ └──────────┴──────────┴──────────┴──────────┘ │ │ ┌──────────┐ │ │ │ Key-Value│ WAL + Replication Log (CRDT) │ │ └──────────┘ │ └──────────────────────────────────────────────────────┘ ``` 1. **Vector Index (HNSW)** — semantic similarity search across memories 2. **Graph Index** — entity relationships, profile aggregation, bridge detection 3. **Temporal Index** — time-aware queries ("what happened Tuesday", "upcoming deadlines") 4. **Decay Heap** — importance scores that degrade over time, like human memory 5. **Key-Value Store** — fast facts, session state, scoring weights ### Decoupled Write Path (v0.6.6+) The vector index is structured as a **two-tier LSM**: a small mutable delta and an immutable HNSW cold tier swapped atomically via `ArcSwap`. Foreground writes only touch the delta (brief lock, O(1) push); HNSW work amortizes on a dedicated compactor thread. This is what eliminated the production wedge where sustained writes starved readers — see [CONCURRENCY.md](CONCURRENCY.md) and [docs/decoupled_write_path_rfc.md](docs/decoupled_write_path_rfc.md). ```mermaid flowchart LR subgraph CLIENT["Caller"] C1["record / record_with_rid"] C2["recall / recall_with_seq"] end subgraph FG["Foreground — P1, brief locks only"] F1["assign_seq<br/>vec_seq.fetch_add<br/>(or fetch_max for cluster seq)"] F2["DeltaIndex.append<br/>brief RwLock<Vec> push"] F3["bump_visible_seq<br/>DashMap + AtomicU64<br/>(lock-free)"] F4["log_op → SQLite WAL"] end subgraph IDX["DeltaIndex (per engine)"] D1[("delta<br/>RwLock<Vec<DeltaEntry>><br/>cap = delta_max (256)")] D2[("cold<br/>ArcSwap<HnswIndex><br/>lock-free read")] end subgraph BG["Background — P3, dedicated threads"] B1["Compactor (1s tick)<br/>fires when delta past half-cap<br/>OR oldest entry > max_dirty_age"] B2["Materializer pool<br/>N = cores / 2<br/>drains pending oplog ops"] end subgraph STORE["SQLite (WAL mode, single file)"] S1["memories"] S2["oplog"] S3["entity_edges, sessions, ..."] end C1 --> F1 F1 --> F2 F2 --> D1 F1 --> F3 F1 --> F4 F4 --> S2 C2 -.->|"optional<br/>wait_for_visible_seq"| F3 C2 --> D1 C2 --> D2 B1 -->|"seal + clone + ArcSwap.store"| D1 B1 --> D2 B2 --> S2 B2 --> S1 B2 --> S3 ``` **The structural invariant.** Foreground (P1) and background (P3) do not share a lock primitive that holds for non-O(1) work. The cold tier is read lock-free via `ArcSwap`; the delta's `RwLock` is held for the O(1) push only. This is what makes "no single background task can wedge reads, writes, or recovery" enforceable — see [CONCURRENCY.md](CONCURRENCY.md) Rules 2 and 3 for the names and failure modes if violated. ### Cluster Mode (RFC 010 + Phase 6 RYW) For multi-node deployments, [yantrikdb-server](https://github.com/yantrikos/yantrikdb-server) wraps the engine with [openraft](https://github.com/datafuselabs/openraft) for leader-elected replication. The four cluster-mutation primitives take the openraft commit-log index as their `seq`, so all nodes agree on a single global monotonic sequence — read-your-writes works across the cluster, not just within a node. ```mermaid flowchart LR L["Leader<br/>HTTP request"] LR["Leader engine<br/>record_with_rid(seq=Some(log_idx))"] OR["openraft<br/>commit log"] F1["Follower 1 applier<br/>record_with_rid(seq=Some(log_idx))"] F2["Follower 2 applier<br/>record_with_rid(seq=Some(log_idx))"] R["Reader on any node<br/>recall_with_seq(min_seq=log_idx)"] L --> LR LR --> OR OR -->|replicate + apply| F1 OR -->|replicate + apply| F2 F1 -.->|"visible_seq[ns] reaches log_idx"| R F2 -.->|"visible_seq[ns] reaches log_idx"| R LR -.->|"visible_seq[ns] reaches log_idx"| R ``` Each `record_with_rid` / `tombstone_with_rid` / `upsert_entity_edge_with_id` / `delete_entity_edge_with_id` accepts an optional `seq: Option<u64>`. Single-node callers pass `None` and the engine allocates; cluster appliers pass `Some(commit_log_index)` and the engine ratchets `vec_seq` up to at least that value via `fetch_max`. After apply, `visible_seq[namespace]` reaches the log index, so any subsequent `recall_with_seq(min_seq=N)` blocks just long enough for the local node to have applied through index N — and no longer. ### Memory Types (Tulving's Taxonomy) | Type | What it stores | Example | |------|---------------|---------| | **Semantic** | Facts, knowledge | "User is a software engineer at Meta" | | **Episodic** | Events with context | "Had a rough day at work on Feb 20" | | **Procedural** | Strategies, what worked | "Deploy with blue-green, not rolling update" | All memories carry **importance**, **valence** (emotional tone), **domain**, **source**, **certainty**, and **timestamps** — used in a multi-signal scoring function that goes far beyond cosine similarity. ## Key Capabilities ### Relevance-Conditioned Scoring Not just vector similarity. Every recall combines: - **Semantic similarity** (HNSW) — what's topically related - **Temporal decay** — recent memories score higher - **Importance weighting** — critical decisions beat trivia - **Graph proximity** — entity relationships boost connected memories - **Retrieval feedback** — learns from past recall quality Weights are tuned automatically from usage patterns. ### Conflict Detection & Resolution When memories contradict, YantrikDB doesn't guess — it creates a conflict segment: ``` "works at Google" (recorded Jan 15) vs. "works at Meta" (recorded Mar 1) → Conflict: identity_fact, priority: high, strategy: ask_user ``` Resolution is conversational: the AI asks naturally, not programmatically. ### Semantic Consolidation After many conversations, memories pile up. `think()` runs: 1. **Consolidation** — merge similar memories, extract patterns 2. **Conflict scan** — find contradictions across the knowledge base 3. **Pattern mining** — cross-domain discovery ("work stress correlates with health entries") 4. **Trigger evaluation** — proactive insights worth surfacing ### Proactive Triggers The engine generates triggers when it detects something worth reaching out about: - Memory conflicts needing resolution - Approaching deadlines (temporal awareness) - Patterns detected across domains - High-importance memories about to decay - Goal tracking ("how's the marathon training?") Every trigger is grounded in real memory data — not engagement farming. ### Multi-Device Sync (CRDT) Local-first with append-only replication log: - **CRDT merging** — graph edges, memories, and metadata merge without conflicts - **Vector indexes rebuild locally** — raw memories sync, each device rebuilds HNSW - **Forget propagation** — tombstones ensure forgotten memories stay forgotten - **Conflict detection** — contradictions across devices are flagged for resolution ### Sessions & Temporal Awareness ```python sid = db.session_start("default", "claude-code") db.record("decided to use PostgreSQL") # auto-linked to session db.record("Alice suggested Redis for caching") db.session_end(sid) # → computes: memory_count, avg_valence, topics, duration db.stale(days=14) # high-importance memories not accessed recently db.upcoming(days=7) # memories with approaching deadlines ``` ## Full API | Operation | Methods | |-----------|---------| | **Core** | `record`, `record_batch`, `recall`, `recall_with_response`, `recall_refine`, `forget`, `correct` | | **Knowledge Graph** | `relate`, `get_edges`, `search_entities`, `entity_profile`, `relationship_depth`, `link_memory_entity` | | **Cognition** | `think`, `get_patterns`, `scan_conflicts`, `resolve_conflict`, `derive_personality` | | **Triggers** | `get_pending_triggers`, `acknowledge_trigger`, `deliver_trigger`, `act_on_trigger`, `dismiss_trigger` | | **Sessions** | `session_start`, `session_end`, `session_history`, `active_session`, `session_abandon_stale` | | **Temporal** | `stale`, `upcoming` | | **Procedural** | `record_procedural`, `surface_procedural`, `reinforce_procedural` | | **Lifecycle** | `archive`, `hydrate`, `decay`, `evict`, `list_memories`, `stats` | | **Sync** | `extract_ops_since`, `apply_ops`, `get_peer_watermark`, `set_peer_watermark` | | **Maintenance** | `rebuild_vec_index`, `rebuild_graph_index`, `learned_weights` | ## Technical Decisions | Decision | Choice | Rationale | |----------|--------|-----------| | **Core language** | Rust | Memory safety, no GC, ideal for embedded engines | | **Architecture** | Embedded (like SQLite) | No server overhead, sub-ms reads, single-tenant | | **Bindings** | Python (PyO3), TypeScript | Agent/AI layer integration | | **Storage** | Single file per user | Portable, backupable, no infrastructure | | **Sync** | CRDTs + append-only log | Conflict-free for most operations, deterministic | | **Thread safety** | Mutex/RwLock, Send+Sync | Safe concurrent access from multiple threads | | **Query interface** | Cognitive operations API | Not SQL — designed for how agents think | ## Ecosystem | Package | What | Install | |---------|------|---------| | [yantrikdb](https://crates.io/crates/yantrikdb) | Rust engine | `cargo add yantrikdb` | | [yantrikdb](https://pypi.org/project/yantrikdb/) | Python bindings (PyO3) | `pip install yantrikdb` | | [yantrikdb-mcp](https://pypi.org/project/yantrikdb-mcp/) | MCP server for AI agents | `pip install yantrikdb-mcp` | ## Roadmap - [x] **V0** — Embedded engine, core memory model (record, recall, relate, consolidate, decay) - [x] **V1** — Replication log, CRDT-based sync between devices - [x] **V2** — Conflict resolution with human-in-the-loop - [x] **V3** — Proactive cognition loop, pattern detection, trigger system - [x] **V4** — Sessions, temporal awareness, cross-domain pattern mining, entity profiles - [ ] **V5** — Multi-agent shared memory, federated learning across users ## Worked example: Wirecard (RFC 008 substrate — with honest limits) For nearly a decade, Wirecard's filings and EY's audit attested to €1.9B in Philippine escrow accounts. In June 2020 both banks and the central bank formally denied the accounts existed. When the `source_lineage` fields are hand-populated — EY as `[wirecard, ey]` to capture audit dependence on Wirecard-provided documents, BSP as `[bsp, bpi, bdo]` to capture restatement of the commercial banks — RFC 008's `⊕` discounts the dependent claims, and the contest operator's temporal split distinguishes present-tense contradictions from historical state changes. On this hand-populated data, the substrate produces useful annotations. **Honest limits** (surfaced by Phase 2 empirical testing, Apr 2026): - On naturalistic evidence where a real agent populates the fields, the substrate's gates don't reliably fire. Cases B and C of the Phase 2 eval need an extractor/canonicalizer (not yet built) to work; Case A exposed that `⊕` is mathematically incapable of flipping decisions at realistic N, regardless of coefficient tuning. - **Current claim**: structured schema for evidence provenance/temporal/conflict annotation, useful for audit and inspection. The dependence-discount operator works on curated inputs but needs replacement before it can drive decisions. - **Not a current claim**: "decision-improvement substrate for AGI-capable agents." That framing is withdrawn pending RFC 009. See **[docs/showcase/wirecard.md](docs/showcase/wirecard.md)** for the full walkthrough including the Phase 2 negative result and the gold-state ablation that partitioned operator failure from extraction failure. Run the hand-populated demonstration directly: ```bash cargo run --example showcase_wirecard ``` ## Research & Publications ### 📄 Skill as Memory, Not Document (May 2026) [Sarkar, P. (2026). *Skill as Memory, Not Document: A Database-Native Substrate for Agent Skill Catalogs*. Zenodo.](https://doi.org/10.5281/zenodo.20128887) A measurement paper at 5K-skill scale: token cost vs filesystem catalogs (with the honest 1.49× ablation), retrieval latency (87.3 ms p50), and invalid-skill admission (0% YantrikDB vs 97% document-only baseline). Reproducible scripts + raw CSVs at [yantrikdb-server/benchmarks/skill_recall/](https://github.com/yantrikos/yantrikdb-server/tree/main/benchmarks/skill_recall). Companion blog: [yantrikdb.com/papers/skill-substrate](https://yantrikdb.com/papers/skill-substrate/). ### Earlier work - **U.S. Patent Application 19/573,392** (March 2026): "Cognitive Memory Database System with Relevance-Conditioned Scoring and Autonomous Knowledge Management" - **Zenodo (software):** [YantrikDB: A Cognitive Memory Engine for Persistent AI Systems](https://doi.org/10.5281/zenodo.18793952) ## Author **Pranab Sarkar** — [ORCID](https://orcid.org/0009-0009-8683-1481) · [LinkedIn](https://www.linkedin.com/in/pranab-sarkar-b0511160/) · [email protected] ## License AGPL-3.0. See [LICENSE](LICENSE) for the full text. The [MCP server](https://github.com/yantrikos/yantrikdb-mcp) is MIT-licensed — using the engine via the MCP server does not trigger AGPL obligations on your code.