About duckdb-otlp

stream, store, and query OpenTelemetry metrics, logs, and traces (OTLP) in duckdb

s

Published by

smithclay

Visit View Profile

README.md

View on GitHub

DuckDB OpenTelemetry Extension

DuckDB extension for querying and storing OpenTelemetry traces, logs, and metrics with SQL.

As of v0.5, the extension has an embedded HTTP server that lets you stream live telemetry into local or remote parquet files, DuckLake, or Iceberg catalogs like Amazon S3 Tables and Cloudflare R2 Data Catalog.

Quickstart: Read OpenTelemetry data

Install and load the extension in duckdb v1.5.3 or higher:

-- Note: v0.5.0 is still pending publication
-- Use "Install pre-release extension" steps below for latest
INSTALL otlp FROM community;
LOAD otlp;

Install pre-release extension via GitHub

If you want to use a pre-release that's not on the duckdb community site, you can install it (unsigned) via GitHub:

-- Install unsigned extenstion from GitHub
-- You must start duckdb with `-unsigned` to allow this
INSTALL otlp from 'https://smithclay.github.io/duckdb-otlp';
LOAD otlp;

Read OTLP protobuf/JSON data from public URLs, local files, or object storage buckets:

-- Install extension to support reading over HTTP(S)
INSTALL httpfs; LOAD httpfs;

-- Read logs exported from the OpenTelemetry Collector
SELECT time_unix_nano, service_name, severity_text, body FROM read_otlp_logs('https://github.com/smithclay/duckdb-otlp/raw/refs/heads/main/test/data/otlp_logs.pb');

-- Read traces exported from the OpenTelemetry Collector
SELECT trace_id, name, duration_time_unix_nano FROM read_otlp_traces('https://github.com/smithclay/duckdb-otlp/raw/refs/heads/main/test/data/otlp_traces.pb') ORDER BY duration_time_unix_nano DESC;

Quickstart: Stream OpenTelemetry data

You can start a server that accepts OpenTelemetry data from instrumented code, AI agents such as Claude Code or Codex, or OpenTelemetry Collectors.

You can either run a Docker image that runs the extension as a daemon, or type some short commands in DuckDB shell.

Run server as a daemon with Docker

# Bootstraps an embedded DuckDB instance with the server running
# Writes data to a local DuckLake file
mkdir -p data 
export DUCKDB_OTLP_TOKEN=dev-token-123456

docker run --rm --name duckdb-otlp \
    -p 4318:4318 \
    -e DUCKDB_OTLP_TOKEN \
    -v "$(pwd)/data:/data" \
    ghcr.io/smithclay/duckdb-otlp:latest

To query the running daemon using Quack protocol, see docs here.

Start server in the DuckDB shell

-- See instructions above for loading otlp extension
-- Inside DuckDB 1.5.3+
FROM otlp_serve(
    'otlp:localhost:4318',
    token := 'dev-token-123456'
);

Send one hello-world log in OTLP/HTTP format with cURL:

curl -sS http://localhost:4318/v1/logs -H 'Authorization: Bearer dev-token-123456' -H 'Content-Type: application/json' -d '{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"curl-demo"}}]},"scopeLogs":[{"logRecords":[{"timeUnixNano":"1704067200000000000","severityText":"INFO","body":{"stringValue":"hello from curl"}}]}]}]}'

Query the data after ~5 seconds for the buffer to flush:

SELECT time_unix_nano, service_name, severity_text, body FROM otlp_logs;

Live ingest commits buffered rows in the background after about 5 seconds for the oldest buffered row or about 128 MiB of admitted request-body bytes. Use otlp_flush when readers need accepted rows durable and queryable while the server keeps running.

For a full walkthrough, including lakehouse ingest, see the docs.

Schema

The schemas align with a normalized ClickStack-inspired version of the OpenTelemetry Arrow Data model as of extension release v0.5.0. Release v0.5.0 includes breaking schema changes from v0.4.0.

What You Can Do

Read OTLP traces, logs, gauges, sums/counters, histograms, and exponential histograms from files.
Stream live OTLP/HTTP exports into the default DuckDB catalog, an attached DuckLake lakehouse, or an Iceberg REST catalog such as Amazon S3 Tables or Cloudflare R2 Data Catalog.
Convert telemetry to Parquet files and save to cloud storage.
Query local files, globs, S3, HTTP(S), Azure Blob, and GCS paths through DuckDB file systems.
Use the browser demo for JSON, JSONL, and protobuf exploration with DuckDB-WASM: Interactive Demo.

Documentation

API at a Glance

Function	What it does
`read_otlp_traces(path)`	Read trace spans with identifiers, attributes, events, links, and duration
`read_otlp_logs(path)`	Read log records with severity, body, attributes, and trace correlation
`read_otlp_metrics_gauge(path)`	Read gauge metrics
`read_otlp_metrics_sum(path)`	Read sum/counter metrics
`read_otlp_metrics_histogram(path)`	Read standard histogram metrics
`read_otlp_metrics_exp_histogram(path)`	Read exponential histogram metrics
`otlp_serve([uri], ...)`	Start a native OTLP/HTTP ingest server
`otlp_flush(uri)`	Optionally force buffered ingest rows to commit to the target catalog now
`otlp_stop(uri)`	Stop a server after committing remaining rows
`otlp_server_list()`	Inspect running servers and ingest counters

The extension registers read_otlp_metrics and read_otlp_metrics_summary, but those functions remain unsupported until the project defines stable schemas for those shapes. See the API Reference for details.

Installation

INSTALL otlp FROM community;
LOAD otlp;

For source builds, development commands, and WASM builds, see CONTRIBUTING.md. WASM supports JSON, JSONL, and protobuf file reads, but not the live ingest server.

Limits

Early-stage and single-node (one daemon, one writer — no HA or horizontal scaling). Ingest has been benchmarked at ~100k logs/s on a 4-vCPU node; querying at volume is unproven, so test on your own data.

Durability is the seal, not the 202. Live ingest buffers in memory and commits on a periodic group-commit ("seal"); a 202 means accepted, not durable. Call otlp_flush/otlp_stop before shutting down — a hard kill drops un-sealed rows (there is no WAL).
Keep queries time-bounded. Data lands roughly time-ordered, so queries scoped by timestamp (and service_name) prune well; unbounded scans are slow. There is no full-text index — body substring/regex search and trace_id point lookups are brute-force scans: cheap over a short window, expensive over a wide one.
File reads cap individual files at 100 MB. Live ingest is HTTP-only (no gRPC, not in the WASM build), bounds request bodies via max_body_bytes, and applies max_buffered_bytes backpressure (returns 503); see the Live Ingest Reference.

Need Help?

License

MIT. See LICENSE for details.

duckdb-otlp