bigRAG

Open-source, self-hostable RAG platform with Turbopuffer-backed search.

Upload documents, auto-chunk, embed, and retrieve through semantic, keyword, and hybrid search — all behind one clean REST API.

Quick Start · Architecture · API Reference · SDKs · MCP Server · Configuration

Features

Document ingestion — PDF, DOCX, PPTX, HTML, Markdown, images, and more via Docling
Embedding providers — OpenAI, OpenAI-compatible gateways, Cohere, and Voyage
Embedding presets — save named provider/model configs once, reuse across collections
Turbopuffer search — vectors, chunk text, metadata filters, BM25 keyword search, and hybrid retrieval via Turbopuffer
Namespace isolation — each collection maps to a Turbopuffer namespace for scoped writes, exports, truncation, and deletion
Reranking — Cohere reranking for improved result relevance
Multi-collection queries — search across collections in a single request
Generated chat — stateless backend-grounded playground chat with streaming and citations
Batch operations — bulk upload, delete, status checks, and queries
S3/R2 connector — mirror bucket prefixes with manual or scheduled sync
Status polling — REST endpoints for document and batch processing status
Auth, audit, scopes — admin accounts, session cookies, scoped bigrag_sk_… API keys, and full audit/access logs
Metadata controls — per-collection metadata schemas, file validation, and content-hash deduplication at ingest
Retrieval evaluation runner — ship recall@k / MRR / nDCG regressions against a golden set
Analytics — per-collection query analytics and platform-wide stats
Webhooks — HMAC-signed delivery, retries, circuit breaker, admin replay
Encrypted sensitive caches at rest — provider API keys, webhook secrets, embedding-cache rows, and Redis cache payloads sealed with Fernet (BIGRAG_MASTER_KEY)
Self-hostable — single docker compose up to run everything
Clients — TypeScript and Python SDKs plus an MCP server for Claude Desktop, Cursor, and any MCP-aware runtime

Quick Start

docker compose up -d

This starts the bigRAG API, worker, admin UI, Postgres, and Redis. Open localhost:3000 for the admin UI or localhost:4000/docs for the interactive API docs.

[!IMPORTANT] Configure Turbopuffer from onboarding before ingesting or querying collections.

Once Turbopuffer is configured, create the first admin and mint an API key for HTTP clients:

export BASE="http://localhost:4000"

curl -X POST "$BASE/v1/auth/setup" \
  -H "Content-Type: application/json" \
  -c cookies.txt \
  -d '{"email": "[email protected]", "password": "a-strong-password", "display_name": "Admin"}'

export BIGRAG_API_KEY=$(curl -s -X POST "$BASE/v1/admin/api-keys" \
  -b cookies.txt \
  -H "Content-Type: application/json" \
  -d '{"name": "local-dev", "scopes": ["*:*"]}' | jq -r .key)

# Create a collection
curl -X POST "$BASE/v1/collections" \
  -H "Authorization: Bearer $BIGRAG_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "docs", "embedding_api_key": "sk-..."}'

# Upload a document
curl -X POST "$BASE/v1/collections/docs/documents" \
  -H "Authorization: Bearer $BIGRAG_API_KEY" \
  -F "[email protected]"

# Query
curl -X POST "$BASE/v1/collections/docs/query" \
  -H "Authorization: Bearer $BIGRAG_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the main findings?"}'

Development

./dev.sh  # starts Postgres, Redis, the API with hot reload, and the worker

Docker Images

docker pull yoginth/bigrag-api:latest
docker pull yoginth/bigrag-ui:latest

Release artifacts use CalVer (YYYY.M.D). Docker publishes latest for quick starts; pin a dated tag from the release you deploy in production.

Architecture

graph TD
    MCP([MCP client<br/>Claude / Cursor]) -->|bigrag-mcp| API
    AdminUI([Admin UI]) -->|session cookie| API
    SDK([TS / Python SDK]) -->|bigrag_sk_… key| API
    Curl([curl / any HTTP client]) -->|bigrag_sk_… key| API

    API[bigRAG API<br/>Python / FastAPI]

    API --> Auth[Auth, scopes, audit]
    API --> Collections[Collections]
    API --> Documents[Documents]
    API --> Query[Query]
    API --> Chat[Chat]
    API --> Webhooks[Webhooks]

    Documents -->|stage ingestion files| Storage[(Temporary staging<br/>Local disk)]
    Documents -->|sync object prefixes| S3[S3 / R2<br/>bucket prefix mirror]
    Documents -->|enqueue| Redis[(Redis<br/>Job queue + event bus)]
    Redis -->|process| Worker[Ingestion worker]

    Worker -->|parse| Docling[Docling<br/>PDF, DOCX, HTML, Images]
    Worker -->|embed| Embedding[Embedding provider<br/>OpenAI / compatible / Cohere / Voyage]
    Worker -->|store vectors + text| Vectors[(Turbopuffer)]

    Query -->|search| Vectors
    Query -->|embed query| Embedding
    Query -->|rerank| Reranker[Cohere Rerank]
    Chat -->|retrieve context| Query
    Chat -->|generate answer| LLM[Chat provider<br/>OpenAI / compatible]

    Auth --> Postgres
    Collections --> Postgres[(Postgres<br/>Metadata + audit + deliveries)]
    Documents --> Postgres
    Webhooks --> Postgres

API Reference

Method	Endpoint	Description
Health
`GET`	`/health`	Liveness check
`GET`	`/health/ready`	Readiness check (all dependencies)
Auth
`GET`	`/v1/auth/setup-status`	First-run setup status
`POST`	`/v1/auth/setup`	Create first admin
`POST`	`/v1/auth/login`	Session login
`POST`	`/v1/auth/logout`	Revoke current session
`POST`	`/v1/auth/logout-all`	Revoke all sessions for user
`GET`	`/v1/auth/me`	Current session
`GET`	`/v1/auth/whoami`	Current principal, auth method, scopes, and collection pin
`POST`	`/v1/auth/password`	Change password
`GET`/`PUT`	`/v1/auth/preferences`	Per-user admin UI preferences
Collections
`POST`	`/v1/collections`	Create collection
`GET`	`/v1/collections`	List collections
`GET`	`/v1/collections/{name}`	Get collection
`PUT`	`/v1/collections/{name}`	Update collection
`DELETE`	`/v1/collections/{name}`	Delete collection
`GET`	`/v1/collections/{name}/stats`	Collection stats
`POST`	`/v1/collections/{name}/truncate`	Delete all documents, keep the collection
Documents
`POST`	`/v1/collections/{name}/documents`	Upload document
`GET`	`/v1/collections/{name}/documents`	List documents
`GET`	`/v1/collections/{name}/documents/{id}`	Get document
`DELETE`	`/v1/collections/{name}/documents/{id}`	Delete document
`GET`	`/v1/collections/{name}/documents/{id}/chunks`	Get document chunks
`POST`	`/v1/collections/{name}/documents/batch/upload`	Batch upload (up to 100)
`POST`	`/v1/collections/{name}/documents/batch/status`	Batch status check
`POST`	`/v1/collections/{name}/documents/batch/get`	Batch get documents
`POST`	`/v1/collections/{name}/documents/batch/delete`	Batch delete
`GET`	`/v1/documents/{id}`	Cross-collection document lookup
`GET`	`/v1/documents/{id}/chunks`	Cross-collection chunks lookup
Connectors
`GET`/`POST`	`/v1/connectors/s3/sources`	List or create S3/R2 prefix sources
`PATCH`/`DELETE`	`/v1/connectors/s3/sources/{id}`	Update or remove an S3/R2 source
`POST`	`/v1/connectors/s3/sources/{id}/sync`	Manual S3/R2 resync
`GET`	`/v1/connectors/s3/sync-jobs`	S3/R2 sync job history
Chat
`POST`	`/v1/chat`	Create a stateless chat turn
Query
`POST`	`/v1/collections/{name}/query`	Query collection
`POST`	`/v1/query`	Multi-collection query
`POST`	`/v1/batch/query`	Batch query
Vectors
`POST`	`/v1/collections/{name}/vectors/upsert`	Upsert raw vectors
`POST`	`/v1/collections/{name}/vectors/delete`	Delete vectors by ID
Evaluation
`POST`	`/v1/evaluation`	Run a golden-set eval (recall@k, MRR, nDCG)
Webhooks (admin)
`GET`/`POST`	`/v1/admin/webhooks`	List / create webhooks
`GET`/`PUT`/`DELETE`	`/v1/admin/webhooks/{id}`	Manage a webhook
`POST`	`/v1/admin/webhooks/{id}/test`	Fire a test delivery
`GET`	`/v1/admin/webhooks/{id}/deliveries`	Delivery history
`POST`	`/v1/admin/webhooks/{id}/deliveries/{did}/replay`	Replay a past delivery
Admin
`GET`/`POST`	`/v1/admin/users`	Manage admin accounts
`PATCH`/`DELETE`	`/v1/admin/users/{id}`	Update or delete an admin/member account
`GET`/`POST`	`/v1/admin/api-keys`	Mint `bigrag_sk_…` API keys with scopes
`PATCH`/`DELETE`	`/v1/admin/api-keys/{id}`	Update, disable, or delete an API key
`GET`	`/v1/admin/audit`	Audit log
`GET`	`/v1/admin/access/overview`	Access-log rollup
`GET`	`/v1/admin/access/logs`	RAG access logs
`GET`/`POST`	`/v1/admin/embedding-presets`	Saved embedding provider configs
`PATCH`/`DELETE`	`/v1/admin/embedding-presets/{id}`	Update or delete an embedding preset
`GET`/`POST`	`/v1/admin/mcp-servers`	Manage MCP server credentials
`PATCH`/`DELETE`	`/v1/admin/mcp-servers/{id}`	Update or delete an MCP server
`POST`	`/v1/admin/mcp-servers/{id}/rotate`	Rotate an MCP server credential
`GET`	`/v1/stats`	Platform stats
`GET`	`/v1/usage`	Usage analytics
`GET`	`/v1/embeddings/models`	List embedding models
`GET`	`/v1/collections/{name}/analytics`	Collection analytics

Full interactive docs at /docs (Swagger UI) when running.

Embedding Models

Provider	Model	Dimensions
openai	`text-embedding-3-small` (default)	1536
openai	`text-embedding-3-large`	3072
cohere	`embed-english-v3.0`	1024
cohere	`embed-multilingual-v3.0`	1024
cohere	`embed-english-light-v3.0`	384
cohere	`embed-multilingual-light-v3.0`	384
voyage	`voyage-3-large`	1024
voyage	`voyage-3.5`	1024
voyage	`voyage-3.5-lite`	1024
voyage	`voyage-code-3`	1024
voyage	`voyage-finance-2`	1024
voyage	`voyage-law-2`	1024
openai_compatible	custom model at `embedding_base_url`	custom

SDKs

TypeScript

npm install @bigrag/client

Published npm releases use CalVer, for example @bigrag/[email protected].

import { BigRAG } from "@bigrag/client";

const client = new BigRAG({ apiKey: "your-key", baseUrl: "http://localhost:4000" });

// Upload a document
const doc = await client.documents.upload("docs", new File([pdf], "paper.pdf"));

// Poll processing status
let current = doc;
while (current.status === "pending" || current.status === "processing") {
  await new Promise((resolve) => setTimeout(resolve, 2000));
  current = await client.documents.get("docs", doc.id);
  console.log(current.progress?.message ?? current.status, current.progress?.progress ?? 0);
}

// Query
const { results } = await client.queries.query("docs", { query: "What is RAG?" });

Python

pip install bigrag==2026.5.23

from bigrag import BigRAG

client = BigRAG(api_key="your-key", base_url="http://localhost:4000")

# Upload a document
doc = await client.documents.upload("docs", "/path/to/paper.pdf")

# Query
result = await client.queries.query("docs", {"query": "What is RAG?"})

MCP Server

Expose bigRAG to Claude Desktop, Cursor, and any MCP-aware runtime:

BIGRAG_URL=https://bigrag.example.com \
BIGRAG_API_KEY=bigrag_sk_... \
bigrag-mcp

Drop this into claude_desktop_config.json:

{
  "mcpServers": {
    "bigrag": {
      "command": "bigrag-mcp",
      "env": {
        "BIGRAG_URL": "https://bigrag.example.com",
        "BIGRAG_API_KEY": "bigrag_sk_..."
      }
    }
  }
}

Full-workspace keys expose 8 tools — list_collections, get_collection, get_collection_stats, query, multi_collection_query, list_documents, get_document, get_document_chunks. Collection-pinned keys see 6 (no list_collections or multi_collection_query). See docs/sdks/mcp for details.

Configuration

Bootstrap settings use the BIGRAG_ prefix as environment variables, or configure them in bigrag.toml. Backend logging defaults to info / text — use BIGRAG_LOG_FORMAT=json for production log collection. Turbopuffer is configured from the admin UI and stored in Postgres alongside the other instance settings.

Server

Variable	Description	Default
`BIGRAG_PORT`	Server port	`4000`
`BIGRAG_HOST`	Bind address	`127.0.0.1`
`BIGRAG_WORKERS`	API worker processes	`1`
`BIGRAG_ENV`	`dev` or `prod` (prod enables startup safety checks)	`dev`
`BIGRAG_LOG_LEVEL`	Backend log level: `debug`, `info`, `warning`, or `error`	`info`
`BIGRAG_LOG_FORMAT`	Backend log renderer: `text` or `json`	`text`
`BIGRAG_CORS_ORIGINS`	JSON array of allowed browser origins	`[]`
`BIGRAG_TRUSTED_PROXIES`	JSON array of trusted proxy CIDRs used to honor `X-Forwarded-For` for audit and access logs	`[]`

Database & Redis

Variable	Description	Default
`BIGRAG_DATABASE_URL`	Postgres URL (`postgres:5432` inside docker-compose, `localhost:5432` for bare-metal dev)	`postgres://bigrag:bigrag@localhost:5432/bigrag?sslmode=disable`
`BIGRAG_DB_POOL_MIN`	Min Postgres pool size	`5`
`BIGRAG_DB_POOL_MAX`	Max Postgres pool size	`20`
`BIGRAG_MIGRATION_TIMEOUT_SECONDS`	Startup migration check timeout (`0` disables the timeout)	`60`
`BIGRAG_REDIS_URL`	Redis URL	`redis://localhost:6379/0`

Sessions & Auth

Variable	Description	Default
`BIGRAG_SESSION_EXPIRY_HOURS`	Session cookie lifetime	`168`
`BIGRAG_SESSION_COOKIE_NAME`	Session cookie name	`bigrag_session`
`BIGRAG_SESSION_COOKIE_SECURE`	HTTPS-only session cookies	`true`
`BIGRAG_SESSION_COOKIE_SAMESITE`	Session cookie SameSite policy	`lax`
`BIGRAG_SESSION_COOKIE_DOMAIN`	Optional session cookie domain	—
`BIGRAG_AUTH_PRINCIPAL_CACHE_TTL`	Principal cache TTL in seconds	`60`

[!TIP] ./dev.sh and the default Docker Compose setup allow the local admin UI origin http://localhost:3000. For production, set BIGRAG_CORS_ORIGINS to the exact admin UI origin. Cross-site admin UI deployments also need BIGRAG_SESSION_COOKIE_SECURE=true and usually BIGRAG_SESSION_COOKIE_SAMESITE=none.

Embedding

Variable	Description	Default
`BIGRAG_EMBEDDING_API_KEY`	Default embedding API key	—
`BIGRAG_EMBEDDING_PROVIDER`	Default embedding provider	`openai`
`BIGRAG_EMBEDDING_MODEL`	Default embedding model	`text-embedding-3-small`
`BIGRAG_EMBEDDING_DIMENSION`	Default embedding vector dimension	`1536`
`BIGRAG_EMBEDDING_BASE_URL`	Base URL for OpenAI-compatible embedding endpoints	—
`BIGRAG_EMBEDDING_CONCURRENCY`	Max concurrent embedding requests	`8`
`BIGRAG_ALLOWED_EMBEDDING_BASE_URLS`	JSON allow-list for embedding base URLs	`[]`
`BIGRAG_ALLOW_PRIVATE_EMBEDDING_BASE_URLS`	Allow private-network embedding endpoints	`false`

Chat

Variable	Description	Default
`BIGRAG_CHAT_PROVIDER`	Chat provider	`openai`
`BIGRAG_CHAT_MODEL`	Default chat model	`gpt-4.1`
`BIGRAG_CHAT_BASE_URL`	Base URL for OpenAI-compatible chat endpoints	—
`BIGRAG_CHAT_TEMPERATURE`	Default chat temperature	`0.2`
`BIGRAG_CHAT_MAX_CONTEXT_CHARS`	Max retrieved-context characters per chat call	`120000`
`BIGRAG_ALLOWED_CHAT_BASE_URLS`	JSON allow-list for chat base URLs	`[]`
`BIGRAG_ALLOW_PRIVATE_CHAT_BASE_URLS`	Allow private-network chat endpoints	`false`

Security

Variable	Description	Default
`BIGRAG_MASTER_KEY`	Fernet key that encrypts provider credentials, embedding cache rows, and Redis cache payloads (required in `prod`)	—
`BIGRAG_MASTER_KEY_PREVIOUS`	JSON array of old Fernet keys for staged rotation	`[]`

Ingestion & Uploads

Variable	Description	Default
`BIGRAG_UPLOAD_DIR`	Local ingestion staging directory	`./data/uploads`
`BIGRAG_MAX_UPLOAD_SIZE_MB`	Max single-file upload size	`64`
`BIGRAG_MAX_BATCH_UPLOAD_SIZE_MB`	Max total batch-upload size	`128`
`BIGRAG_INGESTION_BATCH_SIZE`	Vectors per embedding batch	`128`
`BIGRAG_CONVERSION_TIMEOUT`	Docling conversion timeout in seconds	`300`
`BIGRAG_CONVERSION_PDF_OCR_ENABLED`	Enable OCR for scanned PDFs	`true`
`BIGRAG_QUEUE_MAX_DEPTH`	Max pending jobs in the ingestion queue	`10000`

Caching

Variable	Description	Default
`BIGRAG_COLLECTION_CACHE_TTL`	Collection metadata cache TTL in seconds	`30`
`BIGRAG_QUERY_EMBEDDING_CACHE_TTL`	Query embedding cache TTL in seconds	`300`
`BIGRAG_QUERY_RESULT_CACHE_TTL`	Exact query-result cache TTL in seconds	`30`
`BIGRAG_EMBEDDING_CACHE_MODE`	Persistent chunk embedding cache mode (`encrypted` or `disabled`)	`encrypted`
`BIGRAG_EMBEDDING_CACHE_RETENTION_DAYS`	Days to keep persistent embedding-cache rows after last use	`30`

Webhooks

Variable	Description	Default
`BIGRAG_WEBHOOK_DELIVERY_TIMEOUT`	Webhook HTTP timeout in seconds	`10`
`BIGRAG_WEBHOOK_RETRY_DELAYS`	JSON array of webhook retry delays in seconds	`[10,30,90]`
`BIGRAG_WEBHOOK_MAX_COUNT`	Max configured webhooks	`50`
`BIGRAG_ALLOW_LOCAL_WEBHOOKS`	Allow webhook URLs on private/local networks	`false`

Supported Formats

PDF, DOCX, PPTX, XLSX, HTML, Markdown, CSV, TSV, XML, JSON, PNG, JPG, TIFF, BMP, GIF — text PDFs are extracted directly, while scanned PDFs and other rich formats are powered by Docling. Scanned-PDF OCR is enabled by default.

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Sponsor

If bigRAG is useful to you, consider sponsoring the project.

License

MIT

rag.computer

About rag.computer

Platforms

Languages

Links

README.md