arkiv
Open-source AI metadata layer for DIT workflows β Resolve-native, CJK-first.
π English | ηΉι«δΈζ
arkiv sits between your media drive and DaVinci Resolve: it ingests your footage, attaches AI-generated metadata (transcript, vision tags, atmosphere, energy, edit position), and surfaces clips via semantic search in any language β Chinese, Japanese, or English. The Resolve plugin lets you search, import with clip color, and drop frame markers without leaving the NLE.
Designed for solo DITs and small crews who own their data: local-first, self-hosted, MIT license, no cloud dependency.
Architecture
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β index.html βββββΊβ server.py βββββΊβ db.py β
β (Tailwind) β β (FastAPI) β β (SQLite) β
βββββββββββββββ ββββββββ¬ββββββββ βββββββββββββββ
β
ββββββββ΄ββββββββ
β embed.py βββββΊ ChromaDB
β (Ollama) β (bge-m3)
ββββββββββββββββ
βββββββββββββββ Ingest Pipeline (2-Phase) βββββββββββββββ
Phase 1: Probe + Transcribe + LLM Polish
βββββββββββββ βββββββββββββββ ββββββββββββββββ
β ingest.py βββtranscribe.pyβββ qwen2.5:14b β
β (FFmpeg) β β(Whisper+VAD)β β (LLM polish) β
βββββββββββββ βββββββββββββββ ββββββββββββββββ
β β
β Silero VAD
β (silence filter)
βΌ
Phase 2: Vision (after unloading LLM from VRAM)
βββββββββββ ββββββββββββββββ
βframes.pyββ β vision.py β
β(extract)β β(qwen3-vl:8b) β
βββββββββββ ββββββββββββββββ
β Full pipeline (4 stages, storage layout, exit codes, maintenance modes): docs/pipeline.md
Screenshots

Features
- Semantic search β query in natural language (Chinese/English/Japanese)
- Chat RAG over your video library β 5-intent assistant for compilation searches, refinement, similarity, analytics, and general questions with persisted conversation memory
- AI transcription β Whisper large-v3-turbo + Silero VAD + LLM polish (Apple Silicon MLX / NVIDIA CUDA)
- 4-layer anti-hallucination guard β VAD silence filter β no_speech threshold β blank/repeat filter β LLM correction
- Frame analysis β qwen3-vl:8b vision descriptions with brand/object recognition
- 2-phase pipeline β transcribe first, unload LLM, then vision (avoids VRAM conflict on 12GB GPUs)
- Rating system β GOOD / NG / Review with notes + clip color in Resolve
- Tag system β auto (AI) + manual tags with autocomplete
- DaVinci Resolve UI β dark theme, 3-panel layout, filmstrip, waveform
- Export β SRT, VTT, TXT, EDL (drop-frame TC), FCPXML 1.8 (FCPX + DaVinci compatible)
- DaVinci Resolve metadata CSV β
/api/export/metadata-csvendpoint exports clip metadata (Camera/Lens/ISO/Shutter/Aperture/GPS/CreateDate) ready for Resolve'sFile β Import Metadata from CSV. Plugin auto-prompts after import - ExifTool integration β auto-extracts 12 fields per clip (Make/Model/LensModel/GPS/ColorSpace/ISO/Shutter/Aperture/FocalLength/CreateDate). Sidecar-aware for Sony XAVC
.XML, iPhone Keys group, Blackmagic Cam app per-vendor lens tags. Auto-detects exiftool binary on Windows (winget/scoop/chocolatey/Program Files) - EDL reel name β uses ExifTool ReelName with safe fallback to filename stem (8-char CMX3600 compat, control-char sanitized)
- HEVC/ProRes browser proxy β auto-builds H.264 proxy on demand for browser playback (Phase 7.7g)
- Tauri native app β desktop app with native file/folder dialogs (Windows panic hook surfaces Rust crashes to stderr)
- DaVinci Resolve plugin β search, import with clip color, add frame markers
- ASC MHL v2 hash manifests β
mhl.py create/verifyCLI emits realurn:ASC:MHL:v2.0withxxh3/md5/sha1/sha256/c4, directory + structure root hashes, chainedascmhl_chain.xml. Interop-verified with ASC reference impl 1.2 β drop-in for Silverstack / MediaVerify / Hedge / YoYotta workflows - Multi-destination offload β
offload.py --src <SD> --dst <A> --dst <B>does chunked parallel copy + per-file hash verify + 3Γ retry on mismatch + atomic rename + sidecar-aware (XAVC / ARRI / RED / iPhone Live Photo). Resumable JSON state file β kill mid-copy and pending files pick up exactly where they stopped. Emits per-dst MHL v2 - Camera report CSV β
camera_report.pywrites 20-col DIT-spec CSV (Reel / TC / Camera / Lens / ISO / Shutter / Aperture / WB / FPS / Codec / ...) for Resolve'sFile β Import Metadata from CSV. Day-summary footer aggregates clip count + runtime by camera / by card
API Authentication
All /api/* endpoints require a Bearer token with the correct scope. Scope-based tokens let you split a fleet by machine role: read-only review stations can use videos_read or media_read, ingest machines can use ingest_write, and admin machines can manage tokens.
First-time bootstrap:
export ARKIV_ADMIN_BOOTSTRAP_TOKEN=$(openssl rand -base64 32)
python server.py
On first startup, the server seeds a single admin token from that env var. Use it once to create per-machine tokens, then unset it and revoke the bootstrap token.
Create and manage tokens directly with the CLI:
python arkiv_token.py create --name "PC-dev" --scopes videos_read,videos_write --ip-allowlist 127.0.0.1/32,100.64.0.0/10 --expires-in 90
python arkiv_token.py list
python arkiv_token.py show <token-id>
python arkiv_token.py revoke <token-id>
Use the token in requests:
curl -H "Authorization: Bearer <token>" http://localhost:8501/api/media
Available scopes: videos_read, videos_write, media_read, collections_read, collections_write, projects_read, projects_write, ingest_write, chat_read, chat_write, admin
Chat API β RAG over your video library
Ask natural-language questions about your archive. The classifier routes each prompt to one of five handlers:
| Intent | Example | What it does |
|---|---|---|
compilation |
"Give me all sunset shots from May" | Semantic search β ranked scene list |
refinement |
"Only the indoor ones" | Filters the previous result, in-conversation |
similarity |
"Similar to scene 42" | Vector nearest-neighbours to a reference clip |
analytics |
"How many hours did I shoot this month?" | Aggregate query over the library |
general |
"What can you help me with?" | Plain LLM chat, no search |
Conversation history (last 10 messages) is threaded into each follow-up, so refinement acts on what the previous turn returned.
Model requirement: chat uses ARKIV_CHAT_MODEL (default qwen2.5:14b) for both intent classification and answers β a single ollama pull qwen2.5:14b covers it. Only set ARKIV_INTENT_MODEL to a smaller model (e.g. qwen2.5:7b-instruct) if that model is actually installed on the Ollama host. If the model is missing, /api/chat returns a clear "run ollama pull β¦" message instead of a 500.
Prerequisite β ingest + index first: chat queries your indexed library, not a standalone chatbot. Ingest media (Step 1) and build the index with python embed.py (Step 2) before chatting. compilation / refinement / similarity need the vector index; analytics needs ingested media only; general is the only intent that works on an empty library. On an empty/unindexed library chat does not error β it just returns "0 results".
# Create a conversation
curl -X POST http://localhost:8501/api/chat \
-H "Authorization: Bearer $ARKIV_TOKEN" \
-H "Content-Type: application/json" \
-d '{"prompt": "Give me all sunset shots"}'
# β {"conversation_id":"β¦", "assistant_text":"β¦", "scene_ids":[β¦], "intent":"compilation", "tokens_used":β¦, "latency_ms":β¦}
# Continue the same conversation β refinement acts on the prior result
curl -X POST http://localhost:8501/api/chat \
-H "Authorization: Bearer $ARKIV_TOKEN" \
-H "Content-Type: application/json" \
-d '{"prompt": "Only indoor ones", "conversation_id": "abc123"}'
# Scope a conversation to specific projects
curl -X POST http://localhost:8501/api/chat -H "Authorization: Bearer $ARKIV_TOKEN" \
-H "Content-Type: application/json" \
-d '{"prompt": "wide establishing shots", "project_scope": ["client-acme"]}'
Read history with GET /api/chat/history/{conversation_id} and list conversations with GET /api/chat/conversations (both need chat_read).
Quick Start
Prerequisites
| Dependency | macOS (brew) | Linux (apt) | Windows |
|---|---|---|---|
| Python 3.9+ | brew install python |
sudo apt install python3 python3-venv |
python.org |
| FFmpeg 6.0+ | brew install ffmpeg |
sudo apt install ffmpeg |
ffmpeg.org |
| Ollama | brew install ollama |
ollama.com/download | ollama.com/download |
DaVinci Resolve Plugin extra (macOS): Resolve requires the official Python 3.10 Framework installer (.pkg) from python.org β Homebrew Python is not recognized. Install path:
/Library/Frameworks/Python.framework/Versions/3.10/. Restart Resolve after install; Py3 should appear in Console and scripts load via Workspace > Scripts.
Install β macOS (brew + pip)
brew install python ffmpeg ollama
git clone https://github.com/vulture-s/arkiv.git
cd arkiv
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install mlx-whisper # Apple Silicon (Metal GPU)
ollama pull bge-m3 && ollama pull qwen3-vl:8b && ollama pull qwen2.5:14b
python health.py
Install β Linux (pip)
sudo apt install python3 python3-venv ffmpeg
git clone https://github.com/vulture-s/arkiv.git
cd arkiv
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install faster-whisper torch # NVIDIA CUDA GPU
# pip install faster-whisper # CPU fallback
ollama pull bge-m3 && ollama pull qwen3-vl:8b && ollama pull qwen2.5:14b
python health.py
Install β Windows (pip, PowerShell)
# Install Python 3.9+, FFmpeg, and Ollama manually first, then:
git clone https://github.com/vulture-s/arkiv.git
cd arkiv
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
pip install faster-whisper torch # NVIDIA CUDA GPU
# pip install faster-whisper # CPU fallback
ollama pull bge-m3; ollama pull qwen3-vl:8b; ollama pull qwen2.5:14b
$env:PYTHONUTF8=1; python health.py
Install β Docker (all platforms)
git clone https://github.com/vulture-s/arkiv.git
cd arkiv
docker compose up -d
# Open http://localhost:8501
Models are pulled automatically inside the Ollama container on first run (may take a few minutes).
Upgrading from v0.3.0 β v0.3.1
v0.3.1 changes the default storage layout (artifacts now live in BASE_DIR/.arkiv/ β see Phase 8.0c). One-shot migration:
cd ~/.arkiv && git pull && python ingest.py --migrate-storage
Full SOP (backup, rollback, per-project layout): docs/pipeline.md#upgrading-from-v030 Β· CHANGELOG v0.3.1
Option A: Web UI β browse, search, rate, and tag in the browser
# macOS / Linux
uvicorn server:app --host 0.0.0.0 --port 8501
# Windows (PowerShell) β UTF-8 required for CJK search
$env:PYTHONUTF8=1; uvicorn server:app --host 0.0.0.0 --port 8501
# Open http://localhost:8501 β click + to ingest media
Option B: CLI only β ingest and search without opening a browser
Both options use the same database. You can mix and match β ingest via CLI, then browse in Web UI, or vice versa.
Note: Do not run CLI and Web UI ingest at the same time. SQLite does not support concurrent writes β run one at a time.
# Step 1 β Ingest your media
python ingest.py --dir /path/to/media
# Step 2 β Build search index
python embed.py
# Step 3 β Search
python embed.py --search "interview outdoor"
Advanced CLI options
# Ingest options
python ingest.py --dir ./media --limit 10 # process first 10 files only
python ingest.py --dir ./media --skip-vision # skip AI frame descriptions
python ingest.py --dir ./media --refresh # re-process already-indexed files
# Index options
python embed.py --rebuild # drop and rebuild from scratch
# Auto-watch a folder for new media
python watch.py /path/to/footage
python watch.py ~/Movies/rushes --interval 10
# API search (requires server running)
# Linux / macOS / Git Bash
curl "http://localhost:8501/api/media?q=keyword&limit=5"
# Windows PowerShell
Invoke-RestMethod "http://localhost:8501/api/media?q=keyword&limit=5"
Configuration
Copy .env.example to .env and customize:
| Variable | Default | Description |
|---|---|---|
ARKIV_DB_PATH |
./media.db |
SQLite database path |
ARKIV_CHROMA_PATH |
./chroma_db |
ChromaDB vector store |
ARKIV_THUMBNAILS_DIR |
./thumbnails |
Thumbnail output dir |
ARKIV_OLLAMA_URL |
http://localhost:11434 |
Ollama API endpoint |
ARKIV_EMBED_MODEL |
bge-m3 |
Embedding model β do not change after indexing (see note below) |
ARKIV_VISION_MODEL |
qwen3-vl:8b |
Vision model for frame descriptions |
ARKIV_CHAT_MODEL |
qwen2.5:14b |
Chat model β answers and (by default) intent classification |
ARKIV_INTENT_MODEL |
(= ARKIV_CHAT_MODEL) |
Optional faster model for intent classification only; must be installed |
ARKIV_WHISPER_MODEL |
mlx-community/whisper-large-v3-turbo (macOS) / large-v3-turbo (other) |
Whisper model |
ARKIV_CUSTOM_VOCABULARY |
(empty) | Comma-separated hotwords (names/jargon) fed to Whisper's initial_prompt |
ARKIV_VOCABULARY_FILE |
(empty β .arkiv/vocabulary.txt if present) |
Newline-delimited hotword file (one term/line, # comments); merged with the above |
ARKIV_EXIFTOOL_PATH |
(empty β auto-detect) | Path to exiftool binary (optional) |
ARKIV_FFMPEG_PATH |
(empty β auto-detect) | Path to ffmpeg binary (optional; set on headless Windows where only a WinGet alias shim is on PATH) |
ARKIV_FFPROBE_PATH |
(empty β auto-detect) | Path to ffprobe binary (optional; same as above) |
ARKIV_HOST |
0.0.0.0 |
Server bind address |
ARKIV_PORT |
8501 |
Server port |
Embedding model is locked to your index. The vector store is built with one embedding model (
bge-m3, 1024-dim). ChangingARKIV_EMBED_MODELafter you have indexed media makes new query vectors incompatible with stored ones β search results degrade silently. To switch models, re-index from scratch.Hardware floor for chat:
qwen2.5:14bneeds ~9 GB and runs alongside the embedding model. Plan for ~12β16 GB free RAM/VRAM on the Ollama host. On tighter machines, setARKIV_CHAT_MODEL=qwen2.5:7b(~4.7 GB) for a lighter default.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Tailwind CSS + vanilla JS |
| Backend | FastAPI + Uvicorn |
| Database | SQLite (metadata) + ChromaDB (vectors) |
| Embedding | Ollama bge-m3 (1024d, cosine) |
| Transcription | mlx-whisper / faster-whisper (large-v3-turbo) |
| VAD | Silero VAD (silence filter before Whisper) |
| LLM Polish + Chat | Ollama qwen2.5:14b (transcript polish + 5-intent chat RAG) |
| Vision | Ollama qwen3-vl:8b (brand/object recognition) |
| Media | FFmpeg (probe, thumbnails, frame extraction) |
| Metadata | ExifTool (12 fields, sidecar-aware, cross-platform auto-detect) |
| Export | SRT, VTT, TXT, EDL (DF/NDF), FCPXML 1.8 |
| Desktop | Tauri (native app wrapper) |
| NLE Plugin | DaVinci Resolve (import + clip color + markers) |
FAQ
Q: Which Whisper backend should I use?
- macOS with Apple Silicon:
mlx-whisper(fastest, uses Metal GPU) - NVIDIA GPU:
faster-whisper+torch(CUDA acceleration) - CPU only:
faster-whisper(slower but works everywhere)
Q: Do I need Ollama running?
Yes, for semantic search (embedding) and optional frame descriptions. Run ollama serve before starting arkiv.
Q: How do I add media?
Use the + button in the Media Pool sidebar, or run python ingest.py --dir /path/to/media from CLI.
Q: Can I use this without Docker? Yes β the native Python install is the primary workflow. Docker is optional for deployment.
Q: What file formats are supported?
Video: .mp4, .mov, .mkv, .avi, .webm, .m4v, .mts
Audio: .wav, .mp3, .m4a, .aac, .flac, .ogg
Smoke Test
Run the built-in smoke test to verify your setup:
# PC (Windows/macOS)
bash smoke-test.sh --platform pc
# Docker
docker exec arkiv-arkiv-1 bash smoke-test.sh --platform docker
The test has two phases: Health Check (environment) and API Smoke Test (server endpoints).
What SKIP means
SKIP items are optional dependencies β they do not affect functionality. A passing result is 0 FAIL, regardless of SKIP count.
| Check | PC (Windows) | PC (macOS) | Docker | Notes |
|---|---|---|---|---|
| Python >= 3.9 | Required | Required | Required | |
| FFmpeg / ffprobe | Required | Required | Required | |
| Ollama server | Required | Required | Required | |
| bge-m3 | Required | Required | Required | |
| qwen3-vl:8b | Optional | Optional | Optional | For frame descriptions |
| qwen2.5:14b | Optional | Optional | Optional | Transcript polish + chat (required for /api/chat) |
| ExifTool | Optional | Optional | Optional | For rich metadata |
| faster-whisper | Required | Optional | Required | CUDA/CPU whisper |
| mlx-whisper | β | Required | β | Apple Silicon only |
| NVIDIA GPU | Optional | β | β | |
| Apple Silicon | β | Required | β | |
| fastapi + uvicorn | Required | Required | Required |
Latest Results (v0.3.0)
| Platform | Health Check | Smoke Test | Date |
|---|---|---|---|
| macOS M2 Max | TBD | TBD | 2026-05-22 |
| Windows 11 (RTX 4070) | 19/19 PASS, 0 FAIL, 0 SKIP | 9/9 PASS | 2026-05-22 |
| Linux (Docker) | 14/17 PASS, 0 FAIL, 3 SKIP | 9/9 PASS | 2026-05-22 |
License
MIT