emmtrix ONNX-to-C Code Generator (emx-onnx-cgen) compiles ONNX models to portable, deterministic C code for deeply embedded systems. The generated code is designed to run without dynamic memory allocation, operating-system services, or external runtimes, making it suitable for safety-critical and resource-constrained targets.
It now targets full standard ONNX opset 26 support based on ONNX v1.21.0 and supports nearly all microsoft ONNX operators based on ONNX Runtime 1.26.0.
Key characteristics:
- No dynamic memory allocation (
malloc,free, heap usage) - Static, compile-time known memory layout for parameters, activations, and temporaries
- Deterministic control flow (explicit loops, no hidden dispatch or callbacks)
- No OS dependencies, using only standard C headers (for example,
stdint.handstddef.h) - Single-threaded execution model
- Bitwise-stable code generation for reproducible builds
- Readable, auditable C code suitable for certification and code reviews
- Generated C output format spec:
docs/output-format.md - Designed for bare-metal and RTOS-based systems
Current coverage highlights:
- ONNX opset 26 support for the standard operator set shipped with ONNX 1.21.0
- > 99% operator coverage in the generated support report at
SUPPORT_OPS.md - > 99% official ONNX backend model coverage in
ONNX_SUPPORT.md - > 99% ONNX Runtime-derived artifact coverage in
ONNX_SUPPORT.md - Listed on the official ONNX Backend Scoreboard
For PyTorch models, see the related project emx-pytorch-cgen.
Goals
- Correctness-first compilation with outputs comparable to ONNX Runtime.
- Deterministic and reproducible C code generation.
- Clean, pass-based compiler architecture (import → normalize → optimize → lower → emit).
- Minimal C runtime with explicit, predictable data movement.
Non-goals
- Aggressive performance optimizations in generated C.
- Implicit runtime dependencies or dynamic loading.
- Training/backpropagation support.
Features
- CLI for ONNX-to-C compilation and verification.
- Deterministic codegen with explicit tensor shapes and loop nests.
- Minimal C runtime templates in
src/emx_onnx_cgen/templates/. - ONNX Runtime comparison for end-to-end validation.
- Full standard ONNX opset 26 support on top of ONNX v1.21.0.
- Auto-generated operator and model coverage tracking (see
SUPPORT_OPS.md,ONNX_SUPPORT.md, andONNX_ERRORS.md). - Broad support for ONNX Runtime test artifacts beyond the core standard operator set.
- Supported data types:
bfloat16,float16,float,doublefloat8e4m3fn,float8e4m3fnuz,float8e5m2,float8e5m2fnuz,float8e8m0(stored asuint8_twith manual conversion to/fromfloat)float4e2m1(stored asuint8_twith manual conversion to/fromfloat)int2,uint2,int4,uint4(using C23_BitInttypes)int8,uint8,int16,uint16,int32,uint32,int64,uint64boolstring(fixed-size'\0'-terminated C strings; seedocs/output-format.md)sequence(<tensor type>)(fixed-capacity tensor sequences with presence/length metadata; seedocs/output-format.md)optional(<tensor type>)(optional tensors represented via an extra_Bool <name>_presentflag; seedocs/output-format.md)- Not supported:
complex64/complex128, and ONNXmap/sparse_tensor/opaquevalue types.
- Optional support for dynamic dimensions using C99 variable-length arrays (VLAs), when the target compiler supports them.
Usage Scenarios
1. Fully Embedded, Standalone C Firmware
The generated C code can be embedded directly into a bare-metal C firmware or application where all model weights and parameters are compiled into the C source.
Typical characteristics:
- No file system or OS required.
- All weights stored as
static constarrays in flash/ROM. - Deterministic memory usage with no runtime allocation.
- Suitable for:
- Microcontrollers
- Safety-critical firmware
- Systems with strict certification requirements
This scenario is enabled via --large-weight-threshold 0, forcing all weights to be embedded directly into the generated C code.
2. Embedded or Host C/C++ Application with External Weights
The generated C code can be embedded into C or C++ applications where large model weights are stored externally and loaded from a binary file at runtime.
Typical characteristics:
- Code and control logic compiled into the application.
- Large constant tensors packed into a separate
.binfile. - Explicit, generated loader functions handle weight initialization.
- Suitable for:
- Embedded Linux or RTOS systems
- Applications with limited flash but available external storage
- Larger models where code size must be minimized
This scenario is enabled automatically once the cumulative weight size exceeds --large-weight-threshold (default: 102400 bytes).
3. Target-Optimized Code Generation via emmtrix Source-to-Source Tooling
In both of the above scenarios, the generated C code can serve as input to emmtrix source-to-source compilation and optimization tools, enabling target-specific optimizations while preserving functional correctness.
Examples of applied transformations include:
- Kernel fusion and loop restructuring
- Memory layout optimization and buffer reuse
- Reduction of internal temporary memory
- Utilization of SIMD / vector instruction sets
- Offloading of large weights to external memory
- Dynamic loading of weights or activations via DMA
This workflow allows a clear separation between:
- Correctness-first, deterministic ONNX lowering, and
- Target-specific performance and memory optimization,
while keeping the generated C code readable, auditable, and traceable.
The generated C code is intentionally structured to make such transformations explicit and analyzable, rather than relying on opaque backend-specific code generation.
Installation
Install the package directly from PyPI (recommended):
pip install emx-onnx-cgen
To use the verification workflow with ONNX Runtime, install the verification extra:
pip install "emx-onnx-cgen[verify]"
The pinned verification runtime is:
onnxruntime==1.26.0on Python 3.11+onnxruntime==1.23.2on Python 3.10
Minimum Python version: 3.10.
Development
For local setup, testing, and contributor workflows, see docs/development.md.
Quickstart
Compile an ONNX model into a C source file:
emx-onnx-cgen compile path/to/model.onnx build/model.c
Verify an ONNX model end-to-end against ONNX Runtime (default):
emx-onnx-cgen verify path/to/model.onnx
Models that require extra representative inputs to resolve dynamic shapes are not supported for code generation. Export them with static shapes instead.
--test-data-dir is verification input/output data only. It does not change the
generated C code.
Use emx-onnx-cgen as an importable ONNX backend:
import onnx
from onnx.backend import prepare
import emx_onnx_cgen.onnx_backend as emx_backend
model = onnx.load("path/to/model.onnx")
rep = prepare(model, backend=emx_backend)
outputs = rep.run(inputs)
The backend module is emx_onnx_cgen.onnx_backend. It compiles the ONNX model
to C on demand, builds a temporary executable, and runs that executable through
the standard ONNX backend interface.
You can also call it directly without onnx.backend.prepare:
import onnx
from emx_onnx_cgen.onnx_backend import run_model
model = onnx.load("path/to/model.onnx")
outputs = run_model(model, inputs)
CLI Reference
emx-onnx-cgen provides two subcommands: compile and verify.
Common options
These options are accepted by both compile and verify:
--model-base-dir: Base directory for resolving the model path (and related paths).--color: Colorize CLI output (auto,always,never; default:auto).--verbose/-v: Enable verbose logging (includes codegen timing).--truncate-weights-after: Truncate inline weight initializers afterNvalues and insert...placeholders.--large-weight-threshold: Store weights in a binary file once the cumulative byte size exceeds this threshold (default:102400; set to0to disable).--large-temp-threshold: Mark local arrays larger than this threshold as static (default:1024). This applies to generated model temporaries and to generated testbench input/output buffers.--restrict-arrays/--no-restrict-arrays: Enable or disablerestrictqualifiers on generated array parameters.--fp32-accumulation-strategy: Accumulation strategy for float32 inputs (simpleuses float32,fp64uses double; default:simple).--fp16-accumulation-strategy: Accumulation strategy for float16 inputs (simpleuses float16,fp32uses float; default:fp32).--replicate-ort-bugs: Compatibility switch for verification/debugging. Enables emulation of known behavior differences of the ONNX Runtime version pinned inrequirements-ci.txt.--sequence-element-shape: Declare rank and per-axis maxima for sequence inputs with variable element shapes.
compile
emx-onnx-cgen compile <model.onnx> [output.c] [options]
Options:
--model-name: Override the generated model name (default: output file stem).--emit-testbench: Emit a JSON-producingmain()testbench for validation.--testbench-output-format: Choose the generated testbench output format (json,txt,txt-emmtrix, ortxt-emmtrix:<float>).--testbench-file: Emit the testbench into a separate C file at the given path (implies--emit-testbench). If not set, the testbench is embedded in the main output C file (legacy behavior).--emit-data-file: Emit constant data arrays into a companion_dataC file.
verify
emx-onnx-cgen verify <model.onnx> [options]
Options:
--cc: Explicit C compiler command for building the testbench binary.--sanitize: Enable sanitizer instrumentation when compiling the verification binary (-fsanitize=address,undefined). IfEMX_ENABLE_SANITIZEis set, it overrides this flag.--per-node-accuracy: Also compare intermediate tensor outputs and print max error per node.--test-data-dir: Seed verification inputs frominput_*.pbfiles instead of generating random testbench inputs.--test-data-inputs-only: Read onlyinput_*.pbfrom--test-data-dirand still compare outputs against the selected runtime.--max-ulp: Maximum allowed ULP distance for floating outputs (default:100).--atol-eps: Absolute tolerance as a multiple of machine epsilon for floating outputs (default:1.0).--runtime: Runtime backend for verification (onnxruntimeoronnx-reference, default:onnxruntime).--expected-checksum: Exit early withCHECKSUMwhen the generated C checksum matches the expected SHA-256.--replicate-ort-bugs: Verification-only compatibility mode to reproduce known behavior differences of the ONNX Runtime version pinned inrequirements-ci.txt.--temp-dir-root: Root directory in which to create a temporary verification directory (default: system temp dir).--temp-dir: Exact directory to use for temporary verification files (default: create a temporary directory).--keep-temp-dir: Keep the temporary verification directory instead of deleting it.
How verification works:
- Compile with a testbench: the compiler is invoked with
--emit-testbench, generating a C program that runs the model and prints inputs/outputs as JSON. - Build and execute: the testbench is compiled with the selected C compiler
(
--cc,CC, or a detectedcc/gcc/clang) and executed in a temporary directory. - Run runtime backend: the JSON inputs from the testbench are fed to the
selected runtime (
onnxruntimeoronnx-reference) using the same model. The compiler no longer ships a Python runtime evaluator. - Compare outputs: floating outputs are compared by maximum ULP distance. Floating-point verification first ignores very small differences up to --atol-eps × machine epsilon of the evaluated floating-point type, treating such values as equal. For values with a larger absolute difference, the ULP distance is computed, and the maximum ULP distance is reported; non-floating outputs must match exactly. Missing outputs or mismatches are treated as failures.
- ORT unsupported models: when using
onnxruntime, if ORT reportsNOT_IMPLEMENTED, verification is skipped with a warning (exit code 0).
Official ONNX test coverage
emx-onnx-cgen tracks support using generated coverage reports checked into the repository and is listed on the official ONNX Backend Scoreboard.
-
Standard ONNX operator support:
SUPPORT_OPS.mdconsistently reports > 99% verified operator coverage. The remaining unsupported entry is the non-standard contrib operatorcom.microsoft::SparseToDenseMatMul. -
Official ONNX backend models:
ONNX_SUPPORT.mdreports > 99% verified ONNX file coverage against ONNX 1.21.0. -
ONNX Runtime artifact corpus:
ONNX_SUPPORT.mdalso reports > 99% verified coverage for the exported ONNX Runtime artifact set. -
ONNX_SUPPORT.md: overview of ONNX models and their current verification status. -
ONNX_ERRORS.md: summary of the most common verification outcomes and failure reasons. -
SUPPORT_OPS.md: list of ONNX operators and whether they are currently supported.
Related Projects
- emx-pytorch-cgen
A PyTorch-to-C compiler following the same design principles as emx-onnx-cgen, but operating directly on PyTorch models instead of ONNX graphs.
https://github.com/emmtrix/emx-pytorch-cgen - onnx2c
An ONNX-to-C code generator with a different design focus and code generation approach.
https://github.com/kraiskil/onnx2c
Supporting Projects
- emx-regex-cgen
A regex-to-C code generator used to implement the ONNXRegexFullMatchoperator in emx-onnx-cgen.
https://github.com/emmtrix/emx-regex-cgen - emx-ort-test-artifacts
Repository containing exported ONNX test artifacts (*.onnx/*.pbfiles) produced by the ONNX Runtime test infrastructure.
https://github.com/emmtrix/emx-ort-test-artifacts
Maintained by
This project is maintained by emmtrix Technologies GmbH.