kreuzberg
Kreuzberg is a high-performance polyglot document intelligence framework built on a Rust core, designed to extract text, metadata, images, and structured data from a vast array of file types including PDFs, Office documents, images, and over 97 other formats. Engineered for versatility, it offers seamless integration across a wide spectrum of programming languages such as Rust, Python, Java, Go, C, PHP, Ruby, Elixir, R, C, TypeScript (supporting Node, Bun, Deno, and WebAssembly), Swift, Zig, and Dart. Developers can utilize official language bindings, a command-line interface, a REST API, or model context protocol servers to embed document processing capabilities directly into their applications. The framework supports containerized deployment via Docker and orchestration through Helm charts, making it suitable for both cloud-native environments and local development. By leveraging efficient Rust underpinnings, Kreuzberg delivers accurate data extraction while maintaining the flexibility required by modern so