Home
Softono
wombatkv

wombatkv

Open source Apache-2.0 Rust
12
Stars
1
Forks
3
Issues
0
Watchers
1 week
Last Commit

About wombatkv

Object-storage-native KV cache for LLM inference & RL. Cross-restart, cross-conversation, cross-engine via shared S3 bucket.

Platforms

Web Self-hosted

Languages

Rust

WombatKV: object-storage-native KV cache system for inference. Compute once, reuse everywhere.

WombatKV

wombatkv-node wombatkv-cabi wombatkv-daemon docs.rs wombatkv-node docs.rs wombatkv-cabi docs.rs wombatkv-daemon Build codecov License

Object-storage-native KV cache system for LLM inference.

Wombat has your Blocks.

Philosophy: object-storage-native

The database world has been solving this shape: hold the durable truth in object storage, let RAM / NVMe / mmap be the hot path. Compute is free to die, restart, move, and scale without dragging the durable state with it. The pattern is not "S3 is RAM"; it is "durable truth on the bottom, free-to-die compute on top."

Inference memory hasn't had this renaissance. WombatKV is that move for KV cache.

WombatKV makes KV cache a shared, addressable resource. Anything an engine prefills once (system prompts, RAG context, codebase context, shared documents, conversation history) lands in an S3 bucket as content-addressed blocks. Any other engine pointing at the same bucket inherits it: the next process, the next conversation, the next agent in a fan-out, the next teammate on a different laptop, a sibling ds4 reading from the same MinIO on the NAS.

The wins compound across dimensions:

  • Cross-restart: process dies and comes back, no re-prefill.
  • Cross-conversation: same shared doc, five conversations, conv 1 pays once, convs 2-5 inherit.
  • Cross-agent: five concurrent reviewers of the same PR, prefill paid once for all.
  • Cross-engine, cross-machine: two ds4 servers pointed at the same bucket pool prefix-cache between each other.

Content-addressing makes this transparent: the bucket key is a function of (model + tokenizer + dtype + LoRA + prefix chain), so any engine computing the same chain finds the same blocks. No engine-id, no session-id, no coordination.

The 0.1.0-alpha ships with the ds4 adapter (DeepSeek-V4-Flash), validated on M3 Max with local MinIO and on Linux with the daemon-mode transport benches.

Architecture

              embedded mode                daemon mode
              (in-process)                 (sidecar)

         +----------------+               +----------------+
         | engine         |               | engine         |
         |  libwombatkv   |               +-------+--------+
         +-------+--------+                       | SHM, TCP, HTTP
                 |                       +--------v-------+
                 |                       | wombatkv-daemon|
                 |                       +--------+-------+
                 |                                |
         +-------v--------+-----------+-----------v-------+
         |              L0 in-process RAM cache           |
         +------------------------------------------------+
         |              L1 local SSD cache                |
         +------------------------------------------------+
         |  L2 object store: S3, MinIO, R2, GCS, Tigris,  |
         |  Azure.  Durable truth, content-addressed,     |
         |  CRC32C end-to-end                             |
         +------------------------------------------------+

         PUT once.  GET & REUSE many.  MIGRATE freely.  OUTLIVE everything.

How fast?

ds4 + WombatKV vs ds4 native, M3 Max with local Docker MinIO. All three charts below are from the single 2026-05-24 deployment-mode-matrix campaign (p50 TTFT, n=5 warmup-primed; canonical 5k-char Gutenberg prompt). No cross-campaign number-mixing in this section.

Cross-restart wiped, every WombatKV mode beats native cold-prefill

Cross-restart wiped: native ds4 vs every WombatKV mode (canonical 5k-char prompt, n=5)

Engine restarted between turn-1 and turn-2; kvdisk wiped, so native ds4 must re-prefill from scratch. embedded_local hits 89.7×; cross-host modes over WiFi LAN still hold 1.29-1.31× parity-plus.

Even with ds4's kvdisk preserved, partial-prefix beats warm ds4

Partial-prefix sweep: native ds4 with kvdisk preserved vs WombatKV embedded_local (shared 10k-char prefix, 6 cells)

Shared 10000-char prefix, 6 cells (3 suffix sizes × 2 restart policies). embedded_local wins every cell: 2.45-4.45× even when native's kvdisk is preserved, 3.25-7.54× when wiped. Not "cold cache vs warm" — wkv's partial-prefix lookup avoids the full re-prefill that ds4's kvdisk requires for a changed suffix.

Honest losses, where ds4 alone is still the right tool

Scenarios where WombatKV loses to native ds4: pi_review preserved, conversation_switch live

Same campaign, the honest limits. Round-robin conversation switching (0.10-0.13×) and pi_review on one machine with kvdisk preserved (0.65×) are workloads where ds4's own kvdisk + warm engine already handle the state. WombatKV's save+load is pure overhead with nothing to amortize.

All charts, three campaigns (including public-corpus ShareGPT + Gutenberg multi-round), full methodology, per-row breakdowns: BENCHMARKS.md + artifacts/.

Quickstart

Try it (5 minutes)

Run pi (an agent harness) with ds4 + WombatKV via the recipe in examples/pi_ds4_wombatkv/: pre-built ds4-server binary, 5 env vars, local MinIO via docker run. Read that example's README for the full step-by-step.

Build from source

# 1. Clone the ds4 fork (this branch carries the WombatKV C ABI hooks)
git clone -b release/0.1.0-alpha.pre1.0 https://github.com/Venkat2811/ds4

# 2. Clone wombatkv + build the C ABI cdylib (libwombatkv.{so,dylib})
git clone https://github.com/Venkat2811/wombatkv
cd wombatkv && cargo build --release -p wombatkv-cabi

# 3. Build ds4-server with WombatKV linked in
cd ../ds4 && make ds4-server WOMBATKV=1 WOMBATKV_DIR=../wombatkv

# 4. Start local MinIO (dev defaults: minioadmin/minioadmin on loopback)
docker run -d --name minio-wombatkv -p 9000:9000 -p 9001:9001 \
    -e MINIO_ROOT_USER=minioadmin -e MINIO_ROOT_PASSWORD=minioadmin \
    -v $HOME/.minio-wombatkv:/data \
    quay.io/minio/minio server /data --console-address ":9001"

# 5. Configure + run ds4-server
export DS4_WOMBATKV_ENABLE=1
export WMBT_KV_BUCKET=wombatkv-cache-myteam
export WMBT_KV_S3_ENDPOINT=http://127.0.0.1:9000
export WMBT_KV_S3_ACCESS_KEY=minioadmin
export WMBT_KV_S3_SECRET_KEY=minioadmin
./ds4-server --model your-model.gguf --port 8000

WombatKV auto-derives the model fingerprint from the model path. You must explicitly set the bucket + S3 credentials. Dev-default minioadmin/minioadmin is honored only on loopback endpoints; the daemon rejects them for any non-loopback target so it can't accidentally write to the wrong account / bucket. Full env reference: book/src/operations/env.md.

Rust integration (embedding the crates)

Add the C-ABI surface crate (this is what C / C++ engines link against; Rust callers can use wombatkv-node directly):

[dependencies]
wombatkv-cabi = "0.1.0-alpha.pre1.0"   # alpha pin required; cargo add won't see pre-releases by default
# or for the high-level Rust API without the cdylib:
# wombatkv-node = "0.1.0-alpha.pre1.0"
# wombatkv-daemon = "0.1.0-alpha.pre1.0"

Crates

Most users want one of the top three. The rest are pulled in transitively or used only when contributing.

Crate Audience What it gives you
wombatkv-node Rust integrators High-level async API
wombatkv-cabi C / C++ / non-Rust engines libwombatkv.{so,dylib} + wombatkv.h
wombatkv-daemon Sidecar deployments wombatkv-daemon binary + transports
wombatkv-core transitive Primitive types, errors, reuse helpers
wombatkv-format transitive Wire envelope, on-disk segment, CRC32C, BLAKE3
wombatkv-radix transitive Prefix-radix metadata index
wombatkv-store transitive Object-store backend, WAL, CAS
wombatkv-dst contributors Deterministic-simulation test harness
wombatkv-bench contributors Operator and benchmark binaries

Per-crate READMEs live in crates/*/README.md.

What's inside

  • Storage: any S3-compatible object store. KV blocks are content-addressed; no database, no coordinator.
  • Identity: BLAKE3 over (model + tokenizer + dtype + TP/PP + LoRA + prefix chain).
  • Block size: 128 tokens (token-aligned, multi-token-quantization safe).
  • Local cache: 3-tier hierarchy. See book/src/concepts/architecture.md.
  • Metadata index: in-memory radix tree; SlateDB-backed implementation for durability.
  • Compression: zstd by default.
  • Modes: embedded (linked into the engine via C ABI) or daemon (separate process).
  • Testing: 336 unit tests + wombatkv-dst for deterministic-simulation chaos.

Features

Core

  • [x] Content-addressed KV blocks (BLAKE3 chain hashing).
  • [x] Token-aligned 128-token blocks.
  • [x] Prefix-share fall-through: prompts sharing their first M blocks store M blocks under identical keys.
  • [x] Restore from any prefix via deterministic block-prefix lookup.
  • [x] Universal wire envelope (magic, version, length, CRC32C, body) used by every transport and on disk.
  • [x] zstd block compression.
  • [x] Same-model save and restore.
  • [ ] Cross-model restore (research, post-alpha).

Memory hierarchy & hot tiers

  • [x] L0 in-process RAM cache.
  • [x] L1 local SSD cache.
  • [x] L2 object store.
  • [x] Block prefetcher with scored hydration.
  • [x] Lookup-path memory guardrail.
  • [ ] HBM tier (engine-resident).
  • [ ] Typed CPU-RAM tier.
  • [ ] CXL-attached memory tier (CXL.mem).
  • [ ] L0.5 NVMe scratchpad.
  • [ ] Engine-native block-manager bridge.

Metadata index

  • [x] In-memory radix tree.
  • [x] SlateDB-backed durable index.
  • [x] Bootstrap from object storage.
  • [ ] Multi-region replication.

Storage backends

  • [x] AWS S3.
  • [x] MinIO.
  • [x] Cloudflare R2.
  • [x] Google Cloud Storage.
  • [ ] AWS S3 Express One Zone.
  • [ ] Tigris.
  • [ ] Azure Blob Storage.

Deployment modes

  • [x] Embedded mode (C ABI, in-process).
  • [x] Daemon / sidecar mode (separate process).
  • [x] Multi-tenant prefix isolation.
  • [ ] WombatKV Puffer Operator (Kubernetes deployment; cluster-level routing and prewarm).
  • [ ] Daemon-to-daemon replication.

Transport & wire protocols

The wire envelope is transport-agnostic. The list below tracks which protocols are wired into the daemon's listener and dial side.

  • [x] POSIX shared memory.
  • [x] TCP.
  • [x] HTTP/1.1.
  • [ ] WebSocket.
  • [ ] HTTP range reads (partial-block fetch from object store).
  • [ ] RDMA (RoCE v2 and native verbs).
  • [ ] InfiniBand.
  • [ ] NVLink (intra-node GPU-to-GPU peer-to-peer).
  • [ ] GPUDirect Storage (RDMA-to-GPU memory).
  • [ ] NVIDIA cuObject (GPUDirect Storage for Objects; KV blocks land directly in GPU memory).
  • [ ] NIXL-style transfer.
  • [ ] Mooncake-style transfer.

Engine integrations

Bindings

  • [x] C / C++.
  • [x] Rust.
  • [ ] Python.
  • [ ] Go.
  • [ ] Zig.

CLI

  • [ ] wombatkv CLI binary (bucket inspect, block dump, metadata audit, DST seed-replay, cache warmup).

Testing & quality

  • [x] 336 unit tests across the workspace.
  • [x] wombatkv-dst: BUGGIFY-style chaos, seeded fault injection, in-memory oracle.
  • [x] 20 fault classes covering S3, daemon transport, wire format, SlateDB, multi-tenant, resource exhaustion, platform divergence.
  • [x] 200 deterministic plans per sweep (20 classes × 10 seeds).
  • [x] Adversarial-roundtrip integration test on the C ABI boundary.
  • [x] Linux + macOS CI matrix.
  • [x] Drift detectors for platform-specific code and clippy warnings.
  • [ ] cargo miri lane.
  • [ ] AddressSanitizer / ThreadSanitizer lanes.
  • [ ] End-to-end ds4 cross-restart bench on Linux.

Operations

  • [x] make ci canonical gate.
  • [x] 10 operator and bench binaries.
  • [ ] Prometheus exporter.
  • [ ] OpenTelemetry / OTLP exporter.
  • [ ] Grafana dashboards.
  • [ ] Helm chart for the daemon Kubernetes deployment.

Platform support

  • [x] macOS (M3 Max validated end-to-end).
  • [x] Linux (transport benches validated on x86_64; end-to-end ds4 path pending).
  • [ ] Windows (no plans).

Docs

In-repo:

Status

0.1.0-alpha: ds4 adapter only, no production deployment recommended.

  • C ABI: stable at version 1.0 (crates/wombatkv-cabi/include/wombatkv.h).
  • Wire format: v1, single-format. No back-compat reader; future changes will use staged migrations.
  • Pending: end-to-end ds4 path on Linux, cloud S3 production validation, multi-client daemon load.

See also

Development

make ci                  # fmt + clippy + lib tests + DST sweep + drift detectors
./scripts/dst-sweep.sh   # 20 fault classes x 10 seeds

See CONTRIBUTING.md for the contributor lifecycle.

Acknowledgements

WombatKV stands on the shoulders of these projects and the people behind them:

Built with Agentic engineering, using: Codex Claude Code

Citation

If WombatKV helps your work, please cite:

@software{wombatkv2026,
  author       = {Venkat Raman and {WombatKV Contributors}},
  title        = {{WombatKV: object-storage-native KV cache system for LLM inference}},
  year         = {2026},
  version      = {0.1.0-alpha.pre1.0},
  url          = {https://github.com/Venkat2811/wombatkv},
}

See CITATION.cff for the machine-readable form.

Issues, feedback, discussions, and PRs are welcome & appreciated! Reach out on GitHub or Twitter @venkat_systems.

License

Apache-2.0. See LICENSE.