vitoplantamura

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Open Source

OnnxStream

OnnxStream is a lightweight inference library written in C++ designed to run large ONNX models on devices with extremely limited memory. Unlike frameworks that prioritize throughput at the cost of RAM, OnnxStream focuses on minimizing memory consumption through a unique architecture that decouples the inference engine from the weight provider. This allows for streaming model parameters directly from disk or HTTP without loading them entirely into RAM. The library supports running complex models like Stable Diffusion XL 1.0 on a Raspberry Pi Zero 2 with just 298MB of RAM, as well as large language models such as Mistral 7B on desktop servers. It is optimized with XNNPACK acceleration and runs on ARM, x86, RISC-V, and WebAssembly. Key use cases include browser-based AI with YOLOv8 and Whisper, image generation on microcontrollers, and deploying LLMs on resource-constrained hardware. The project offers bindings for Python, C, and JavaScript (WASM), enabling developers to integrate high-performance, low-memory AI

AI & Machine Learning ML Frameworks

2.1K Github Stars

vitoplantamura

Software by vitoplantamura

OnnxStream