Home
Softono
arkiv

arkiv

Open source MIT Python
28
Stars
4
Forks
0
Issues
0
Watchers
1 week
Last Commit

About arkiv

Local-first media asset manager with AI-powered semantic search. DaVinci Resolve-inspired UI.

Platforms

Web Self-hosted

Languages

Python

Links

arkiv

License: MIT Python 3.9+ Tauri

Open-source AI metadata layer for DIT workflows β€” Resolve-native, CJK-first.

🌐 English | 繁體中文

arkiv sits between your media drive and DaVinci Resolve: it ingests your footage, attaches AI-generated metadata (transcript, vision tags, atmosphere, energy, edit position), and surfaces clips via semantic search in any language β€” Chinese, Japanese, or English. The Resolve plugin lets you search, import with clip color, and drop frame markers without leaving the NLE.

Designed for solo DITs and small crews who own their data: local-first, self-hosted, MIT license, no cloud dependency.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  index.html │◄──►│  server.py   │◄──►│   db.py      β”‚
β”‚  (Tailwind) β”‚    β”‚  (FastAPI)   β”‚    β”‚  (SQLite)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                   β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚  embed.py    │◄──► ChromaDB
                   β”‚  (Ollama)    β”‚     (bge-m3)
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  ═══════════════ Ingest Pipeline (2-Phase) ═══════════════

  Phase 1: Probe + Transcribe + LLM Polish
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ ingest.py β”‚β†’β”‚transcribe.pyβ”‚β†’β”‚ qwen2.5:14b  β”‚
  β”‚ (FFmpeg)  β”‚ β”‚(Whisper+VAD)β”‚ β”‚ (LLM polish) β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚              ↑
       β”‚         Silero VAD
       β”‚        (silence filter)
       β–Ό
  Phase 2: Vision (after unloading LLM from VRAM)
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚frames.pyβ”‚β†’ β”‚  vision.py   β”‚
  β”‚(extract)β”‚  β”‚(qwen3-vl:8b) β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β†’ Full pipeline (4 stages, storage layout, exit codes, maintenance modes): docs/pipeline.md

Screenshots

ARKIV UI

Features

  • Semantic search β€” query in natural language (Chinese/English/Japanese)
  • Chat RAG over your video library β€” 5-intent assistant for compilation searches, refinement, similarity, analytics, and general questions with persisted conversation memory
  • AI transcription β€” Whisper large-v3-turbo + Silero VAD + LLM polish (Apple Silicon MLX / NVIDIA CUDA)
  • 4-layer anti-hallucination guard β€” VAD silence filter β†’ no_speech threshold β†’ blank/repeat filter β†’ LLM correction
  • Frame analysis β€” qwen3-vl:8b vision descriptions with brand/object recognition
  • 2-phase pipeline β€” transcribe first, unload LLM, then vision (avoids VRAM conflict on 12GB GPUs)
  • Rating system β€” GOOD / NG / Review with notes + clip color in Resolve
  • Tag system β€” auto (AI) + manual tags with autocomplete
  • DaVinci Resolve UI β€” dark theme, 3-panel layout, filmstrip, waveform
  • Export β€” SRT, VTT, TXT, EDL (drop-frame TC), FCPXML 1.8 (FCPX + DaVinci compatible)
  • DaVinci Resolve metadata CSV β€” /api/export/metadata-csv endpoint exports clip metadata (Camera/Lens/ISO/Shutter/Aperture/GPS/CreateDate) ready for Resolve's File β†’ Import Metadata from CSV. Plugin auto-prompts after import
  • ExifTool integration β€” auto-extracts 12 fields per clip (Make/Model/LensModel/GPS/ColorSpace/ISO/Shutter/Aperture/FocalLength/CreateDate). Sidecar-aware for Sony XAVC .XML, iPhone Keys group, Blackmagic Cam app per-vendor lens tags. Auto-detects exiftool binary on Windows (winget/scoop/chocolatey/Program Files)
  • EDL reel name β€” uses ExifTool ReelName with safe fallback to filename stem (8-char CMX3600 compat, control-char sanitized)
  • HEVC/ProRes browser proxy β€” auto-builds H.264 proxy on demand for browser playback (Phase 7.7g)
  • Tauri native app β€” desktop app with native file/folder dialogs (Windows panic hook surfaces Rust crashes to stderr)
  • DaVinci Resolve plugin β€” search, import with clip color, add frame markers
  • ASC MHL v2 hash manifests β€” mhl.py create / verify CLI emits real urn:ASC:MHL:v2.0 with xxh3 / md5 / sha1 / sha256 / c4, directory + structure root hashes, chained ascmhl_chain.xml. Interop-verified with ASC reference impl 1.2 β€” drop-in for Silverstack / MediaVerify / Hedge / YoYotta workflows
  • Multi-destination offload β€” offload.py --src <SD> --dst <A> --dst <B> does chunked parallel copy + per-file hash verify + 3Γ— retry on mismatch + atomic rename + sidecar-aware (XAVC / ARRI / RED / iPhone Live Photo). Resumable JSON state file β€” kill mid-copy and pending files pick up exactly where they stopped. Emits per-dst MHL v2
  • Camera report CSV β€” camera_report.py writes 20-col DIT-spec CSV (Reel / TC / Camera / Lens / ISO / Shutter / Aperture / WB / FPS / Codec / ...) for Resolve's File β†’ Import Metadata from CSV. Day-summary footer aggregates clip count + runtime by camera / by card

API Authentication

All /api/* endpoints require a Bearer token with the correct scope. Scope-based tokens let you split a fleet by machine role: read-only review stations can use videos_read or media_read, ingest machines can use ingest_write, and admin machines can manage tokens.

First-time bootstrap:

export ARKIV_ADMIN_BOOTSTRAP_TOKEN=$(openssl rand -base64 32)
python server.py

On first startup, the server seeds a single admin token from that env var. Use it once to create per-machine tokens, then unset it and revoke the bootstrap token.

Create and manage tokens directly with the CLI:

python arkiv_token.py create --name "PC-dev" --scopes videos_read,videos_write --ip-allowlist 127.0.0.1/32,100.64.0.0/10 --expires-in 90
python arkiv_token.py list
python arkiv_token.py show <token-id>
python arkiv_token.py revoke <token-id>

Use the token in requests:

curl -H "Authorization: Bearer <token>" http://localhost:8501/api/media

Available scopes: videos_read, videos_write, media_read, collections_read, collections_write, projects_read, projects_write, ingest_write, chat_read, chat_write, admin

Chat API β€” RAG over your video library

Ask natural-language questions about your archive. The classifier routes each prompt to one of five handlers:

Intent Example What it does
compilation "Give me all sunset shots from May" Semantic search β†’ ranked scene list
refinement "Only the indoor ones" Filters the previous result, in-conversation
similarity "Similar to scene 42" Vector nearest-neighbours to a reference clip
analytics "How many hours did I shoot this month?" Aggregate query over the library
general "What can you help me with?" Plain LLM chat, no search

Conversation history (last 10 messages) is threaded into each follow-up, so refinement acts on what the previous turn returned.

Model requirement: chat uses ARKIV_CHAT_MODEL (default qwen2.5:14b) for both intent classification and answers β€” a single ollama pull qwen2.5:14b covers it. Only set ARKIV_INTENT_MODEL to a smaller model (e.g. qwen2.5:7b-instruct) if that model is actually installed on the Ollama host. If the model is missing, /api/chat returns a clear "run ollama pull …" message instead of a 500.

Prerequisite β€” ingest + index first: chat queries your indexed library, not a standalone chatbot. Ingest media (Step 1) and build the index with python embed.py (Step 2) before chatting. compilation / refinement / similarity need the vector index; analytics needs ingested media only; general is the only intent that works on an empty library. On an empty/unindexed library chat does not error β€” it just returns "0 results".

# Create a conversation
curl -X POST http://localhost:8501/api/chat \
  -H "Authorization: Bearer $ARKIV_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Give me all sunset shots"}'
# β†’ {"conversation_id":"…", "assistant_text":"…", "scene_ids":[…], "intent":"compilation", "tokens_used":…, "latency_ms":…}

# Continue the same conversation β€” refinement acts on the prior result
curl -X POST http://localhost:8501/api/chat \
  -H "Authorization: Bearer $ARKIV_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Only indoor ones", "conversation_id": "abc123"}'

# Scope a conversation to specific projects
curl -X POST http://localhost:8501/api/chat -H "Authorization: Bearer $ARKIV_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "wide establishing shots", "project_scope": ["client-acme"]}'

Read history with GET /api/chat/history/{conversation_id} and list conversations with GET /api/chat/conversations (both need chat_read).

Quick Start

Prerequisites

Dependency macOS (brew) Linux (apt) Windows
Python 3.9+ brew install python sudo apt install python3 python3-venv python.org
FFmpeg 6.0+ brew install ffmpeg sudo apt install ffmpeg ffmpeg.org
Ollama brew install ollama ollama.com/download ollama.com/download

DaVinci Resolve Plugin extra (macOS): Resolve requires the official Python 3.10 Framework installer (.pkg) from python.org β€” Homebrew Python is not recognized. Install path: /Library/Frameworks/Python.framework/Versions/3.10/. Restart Resolve after install; Py3 should appear in Console and scripts load via Workspace > Scripts.

Install β€” macOS (brew + pip)

brew install python ffmpeg ollama
git clone https://github.com/vulture-s/arkiv.git
cd arkiv
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install mlx-whisper          # Apple Silicon (Metal GPU)
ollama pull bge-m3 && ollama pull qwen3-vl:8b && ollama pull qwen2.5:14b
python health.py

Install β€” Linux (pip)

sudo apt install python3 python3-venv ffmpeg
git clone https://github.com/vulture-s/arkiv.git
cd arkiv
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install faster-whisper torch  # NVIDIA CUDA GPU
# pip install faster-whisper      # CPU fallback
ollama pull bge-m3 && ollama pull qwen3-vl:8b && ollama pull qwen2.5:14b
python health.py

Install β€” Windows (pip, PowerShell)

# Install Python 3.9+, FFmpeg, and Ollama manually first, then:
git clone https://github.com/vulture-s/arkiv.git
cd arkiv
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
pip install faster-whisper torch  # NVIDIA CUDA GPU
# pip install faster-whisper      # CPU fallback
ollama pull bge-m3; ollama pull qwen3-vl:8b; ollama pull qwen2.5:14b
$env:PYTHONUTF8=1; python health.py

Install β€” Docker (all platforms)

git clone https://github.com/vulture-s/arkiv.git
cd arkiv
docker compose up -d
# Open http://localhost:8501

Models are pulled automatically inside the Ollama container on first run (may take a few minutes).

Upgrading from v0.3.0 β†’ v0.3.1

v0.3.1 changes the default storage layout (artifacts now live in BASE_DIR/.arkiv/ β€” see Phase 8.0c). One-shot migration:

cd ~/.arkiv && git pull && python ingest.py --migrate-storage

Full SOP (backup, rollback, per-project layout): docs/pipeline.md#upgrading-from-v030 Β· CHANGELOG v0.3.1

Option A: Web UI β€” browse, search, rate, and tag in the browser

# macOS / Linux
uvicorn server:app --host 0.0.0.0 --port 8501

# Windows (PowerShell) β€” UTF-8 required for CJK search
$env:PYTHONUTF8=1; uvicorn server:app --host 0.0.0.0 --port 8501

# Open http://localhost:8501 β†’ click + to ingest media

Option B: CLI only β€” ingest and search without opening a browser

Both options use the same database. You can mix and match β€” ingest via CLI, then browse in Web UI, or vice versa.

Note: Do not run CLI and Web UI ingest at the same time. SQLite does not support concurrent writes β€” run one at a time.

# Step 1 β€” Ingest your media
python ingest.py --dir /path/to/media

# Step 2 β€” Build search index
python embed.py

# Step 3 β€” Search
python embed.py --search "interview outdoor"
Advanced CLI options
# Ingest options
python ingest.py --dir ./media --limit 10   # process first 10 files only
python ingest.py --dir ./media --skip-vision # skip AI frame descriptions
python ingest.py --dir ./media --refresh     # re-process already-indexed files

# Index options
python embed.py --rebuild                    # drop and rebuild from scratch

# Auto-watch a folder for new media
python watch.py /path/to/footage
python watch.py ~/Movies/rushes --interval 10

# API search (requires server running)
# Linux / macOS / Git Bash
curl "http://localhost:8501/api/media?q=keyword&limit=5"
# Windows PowerShell
Invoke-RestMethod "http://localhost:8501/api/media?q=keyword&limit=5"

Configuration

Copy .env.example to .env and customize:

Variable Default Description
ARKIV_DB_PATH ./media.db SQLite database path
ARKIV_CHROMA_PATH ./chroma_db ChromaDB vector store
ARKIV_THUMBNAILS_DIR ./thumbnails Thumbnail output dir
ARKIV_OLLAMA_URL http://localhost:11434 Ollama API endpoint
ARKIV_EMBED_MODEL bge-m3 Embedding model β€” do not change after indexing (see note below)
ARKIV_VISION_MODEL qwen3-vl:8b Vision model for frame descriptions
ARKIV_CHAT_MODEL qwen2.5:14b Chat model β€” answers and (by default) intent classification
ARKIV_INTENT_MODEL (= ARKIV_CHAT_MODEL) Optional faster model for intent classification only; must be installed
ARKIV_WHISPER_MODEL mlx-community/whisper-large-v3-turbo (macOS) / large-v3-turbo (other) Whisper model
ARKIV_CUSTOM_VOCABULARY (empty) Comma-separated hotwords (names/jargon) fed to Whisper's initial_prompt
ARKIV_VOCABULARY_FILE (empty β†’ .arkiv/vocabulary.txt if present) Newline-delimited hotword file (one term/line, # comments); merged with the above
ARKIV_EXIFTOOL_PATH (empty β€” auto-detect) Path to exiftool binary (optional)
ARKIV_FFMPEG_PATH (empty β€” auto-detect) Path to ffmpeg binary (optional; set on headless Windows where only a WinGet alias shim is on PATH)
ARKIV_FFPROBE_PATH (empty β€” auto-detect) Path to ffprobe binary (optional; same as above)
ARKIV_HOST 0.0.0.0 Server bind address
ARKIV_PORT 8501 Server port

Embedding model is locked to your index. The vector store is built with one embedding model (bge-m3, 1024-dim). Changing ARKIV_EMBED_MODEL after you have indexed media makes new query vectors incompatible with stored ones β€” search results degrade silently. To switch models, re-index from scratch.

Hardware floor for chat: qwen2.5:14b needs ~9 GB and runs alongside the embedding model. Plan for ~12–16 GB free RAM/VRAM on the Ollama host. On tighter machines, set ARKIV_CHAT_MODEL=qwen2.5:7b (~4.7 GB) for a lighter default.

Tech Stack

Layer Technology
Frontend Tailwind CSS + vanilla JS
Backend FastAPI + Uvicorn
Database SQLite (metadata) + ChromaDB (vectors)
Embedding Ollama bge-m3 (1024d, cosine)
Transcription mlx-whisper / faster-whisper (large-v3-turbo)
VAD Silero VAD (silence filter before Whisper)
LLM Polish + Chat Ollama qwen2.5:14b (transcript polish + 5-intent chat RAG)
Vision Ollama qwen3-vl:8b (brand/object recognition)
Media FFmpeg (probe, thumbnails, frame extraction)
Metadata ExifTool (12 fields, sidecar-aware, cross-platform auto-detect)
Export SRT, VTT, TXT, EDL (DF/NDF), FCPXML 1.8
Desktop Tauri (native app wrapper)
NLE Plugin DaVinci Resolve (import + clip color + markers)

FAQ

Q: Which Whisper backend should I use?

  • macOS with Apple Silicon: mlx-whisper (fastest, uses Metal GPU)
  • NVIDIA GPU: faster-whisper + torch (CUDA acceleration)
  • CPU only: faster-whisper (slower but works everywhere)

Q: Do I need Ollama running? Yes, for semantic search (embedding) and optional frame descriptions. Run ollama serve before starting arkiv.

Q: How do I add media? Use the + button in the Media Pool sidebar, or run python ingest.py --dir /path/to/media from CLI.

Q: Can I use this without Docker? Yes β€” the native Python install is the primary workflow. Docker is optional for deployment.

Q: What file formats are supported? Video: .mp4, .mov, .mkv, .avi, .webm, .m4v, .mts Audio: .wav, .mp3, .m4a, .aac, .flac, .ogg

Smoke Test

Run the built-in smoke test to verify your setup:

# PC (Windows/macOS)
bash smoke-test.sh --platform pc

# Docker
docker exec arkiv-arkiv-1 bash smoke-test.sh --platform docker

The test has two phases: Health Check (environment) and API Smoke Test (server endpoints).

What SKIP means

SKIP items are optional dependencies β€” they do not affect functionality. A passing result is 0 FAIL, regardless of SKIP count.

Check PC (Windows) PC (macOS) Docker Notes
Python >= 3.9 Required Required Required
FFmpeg / ffprobe Required Required Required
Ollama server Required Required Required
bge-m3 Required Required Required
qwen3-vl:8b Optional Optional Optional For frame descriptions
qwen2.5:14b Optional Optional Optional Transcript polish + chat (required for /api/chat)
ExifTool Optional Optional Optional For rich metadata
faster-whisper Required Optional Required CUDA/CPU whisper
mlx-whisper β€” Required β€” Apple Silicon only
NVIDIA GPU Optional β€” β€”
Apple Silicon β€” Required β€”
fastapi + uvicorn Required Required Required

Latest Results (v0.3.0)

Platform Health Check Smoke Test Date
macOS M2 Max TBD TBD 2026-05-22
Windows 11 (RTX 4070) 19/19 PASS, 0 FAIL, 0 SKIP 9/9 PASS 2026-05-22
Linux (Docker) 14/17 PASS, 0 FAIL, 3 SKIP 9/9 PASS 2026-05-22

License

MIT