DuckDB OpenTelemetry Extension
DuckDB extension for querying and storing OpenTelemetry traces, logs, and metrics with SQL.
As of v0.5, the extension has an embedded HTTP server that lets you stream live telemetry into local or remote parquet files, DuckLake, or Iceberg catalogs like Amazon S3 Tables and Cloudflare R2 Data Catalog.
Quickstart: Read OpenTelemetry data
Install and load the extension in duckdb v1.5.3 or higher:
-- Note: v0.5.0 is still pending publication
-- Use "Install pre-release extension" steps below for latest
INSTALL otlp FROM community;
LOAD otlp;
Install pre-release extension via GitHub
If you want to use a pre-release that's not on the duckdb community site, you can install it (unsigned) via GitHub:
-- Install unsigned extenstion from GitHub
-- You must start duckdb with `-unsigned` to allow this
INSTALL otlp from 'https://smithclay.github.io/duckdb-otlp';
LOAD otlp;
Read OTLP protobuf/JSON data from public URLs, local files, or object storage buckets:
-- Install extension to support reading over HTTP(S)
INSTALL httpfs; LOAD httpfs;
-- Read logs exported from the OpenTelemetry Collector
SELECT time_unix_nano, service_name, severity_text, body FROM read_otlp_logs('https://github.com/smithclay/duckdb-otlp/raw/refs/heads/main/test/data/otlp_logs.pb');
-- Read traces exported from the OpenTelemetry Collector
SELECT trace_id, name, duration_time_unix_nano FROM read_otlp_traces('https://github.com/smithclay/duckdb-otlp/raw/refs/heads/main/test/data/otlp_traces.pb') ORDER BY duration_time_unix_nano DESC;
Quickstart: Stream OpenTelemetry data
You can start a server that accepts OpenTelemetry data from instrumented code, AI agents such as Claude Code or Codex, or OpenTelemetry Collectors.
You can either run a Docker image that runs the extension as a daemon, or type some short commands in DuckDB shell.
Run server as a daemon with Docker
# Bootstraps an embedded DuckDB instance with the server running
# Writes data to a local DuckLake file
mkdir -p data
export DUCKDB_OTLP_TOKEN=dev-token-123456
docker run --rm --name duckdb-otlp \
-p 4318:4318 \
-e DUCKDB_OTLP_TOKEN \
-v "$(pwd)/data:/data" \
ghcr.io/smithclay/duckdb-otlp:latest
To query the running daemon using Quack protocol, see docs here.
Start server in the DuckDB shell
-- See instructions above for loading otlp extension
-- Inside DuckDB 1.5.3+
FROM otlp_serve(
'otlp:localhost:4318',
token := 'dev-token-123456'
);
Send one hello-world log in OTLP/HTTP format with cURL:
curl -sS http://localhost:4318/v1/logs -H 'Authorization: Bearer dev-token-123456' -H 'Content-Type: application/json' -d '{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"curl-demo"}}]},"scopeLogs":[{"logRecords":[{"timeUnixNano":"1704067200000000000","severityText":"INFO","body":{"stringValue":"hello from curl"}}]}]}]}'
Query the data after ~5 seconds for the buffer to flush:
SELECT time_unix_nano, service_name, severity_text, body FROM otlp_logs;
Live ingest commits buffered rows in the background after about 5 seconds for the oldest buffered row or about 128 MiB of admitted request-body bytes. Use otlp_flush when readers need accepted rows durable and queryable while the server keeps running.
For a full walkthrough, including lakehouse ingest, see the docs.
Schema
The schemas align with a normalized ClickStack-inspired version of the OpenTelemetry Arrow Data model as of extension release v0.5.0. Release v0.5.0 includes breaking schema changes from v0.4.0.
What You Can Do
- Read OTLP traces, logs, gauges, sums/counters, histograms, and exponential histograms from files.
- Stream live OTLP/HTTP exports into the default DuckDB catalog, an attached DuckLake lakehouse, or an Iceberg REST catalog such as Amazon S3 Tables or Cloudflare R2 Data Catalog.
- Convert telemetry to Parquet files and save to cloud storage.
- Query local files, globs, S3, HTTP(S), Azure Blob, and GCS paths through DuckDB file systems.
- Use the browser demo for JSON, JSONL, and protobuf exploration with DuckDB-WASM: Interactive Demo.
Documentation
- Documentation Site
- Get Started
- Live Ingest Quickstart
- Stream to DuckLake
- Stream to Amazon S3 Tables
- Stream to Cloudflare R2 Data Catalog
- Store Claude Code or Codex Traces in Local DuckLake
- How-to Guides
- API Reference
- Schema Reference
- Live Ingest Reference
- Architecture
API at a Glance
| Function | What it does |
|---|---|
read_otlp_traces(path) |
Read trace spans with identifiers, attributes, events, links, and duration |
read_otlp_logs(path) |
Read log records with severity, body, attributes, and trace correlation |
read_otlp_metrics_gauge(path) |
Read gauge metrics |
read_otlp_metrics_sum(path) |
Read sum/counter metrics |
read_otlp_metrics_histogram(path) |
Read standard histogram metrics |
read_otlp_metrics_exp_histogram(path) |
Read exponential histogram metrics |
otlp_serve([uri], ...) |
Start a native OTLP/HTTP ingest server |
otlp_flush(uri) |
Optionally force buffered ingest rows to commit to the target catalog now |
otlp_stop(uri) |
Stop a server after committing remaining rows |
otlp_server_list() |
Inspect running servers and ingest counters |
The extension registers read_otlp_metrics and read_otlp_metrics_summary, but those functions remain unsupported until the project defines stable schemas for those shapes. See the API Reference for details.
Installation
INSTALL otlp FROM community;
LOAD otlp;
For source builds, development commands, and WASM builds, see CONTRIBUTING.md. WASM supports JSON, JSONL, and protobuf file reads, but not the live ingest server.
Limits
Early-stage and single-node (one daemon, one writer — no HA or horizontal scaling). Ingest has been benchmarked at ~100k logs/s on a 4-vCPU node; querying at volume is unproven, so test on your own data.
- Durability is the seal, not the
202. Live ingest buffers in memory and commits on a periodic group-commit ("seal"); a202means accepted, not durable. Callotlp_flush/otlp_stopbefore shutting down — a hard kill drops un-sealed rows (there is no WAL). - Keep queries time-bounded. Data lands roughly time-ordered, so queries scoped by
timestamp(andservice_name) prune well; unbounded scans are slow. There is no full-text index —bodysubstring/regex search andtrace_idpoint lookups are brute-force scans: cheap over a short window, expensive over a wide one. - File reads cap individual files at 100 MB. Live ingest is HTTP-only (no gRPC, not in the WASM build), bounds request bodies via
max_body_bytes, and appliesmax_buffered_bytesbackpressure (returns503); see the Live Ingest Reference.
Need Help?
License
MIT. See LICENSE for details.