Home
Softono
awesome-tinyml

awesome-tinyml

Open source MIT Python
47
Stars
4
Forks
0
Issues
4
Watchers
7 months
Last Commit

About awesome-tinyml

TinyML & Edge AI: On-device inference, model quantization, embedded ML, ultra-low-power AI for microcontrollers and IoT devices.

Platforms

Web Self-hosted

Languages

Python

Links

๐Ÿš€ AI Edge Computing & TinyML

Comprehensive Guide to State-of-the-Art Edge AI

Typing SVG

GitHub stars GitHub forks License Latest Update


๐ŸŒŸ Latest Update: January 2025

Production-Ready Python Implementation with modern tooling (Hatch, Ruff, Mypy) 62/62 Tests Passing โ€ข 81.76% Coverage โ€ข Zero Security Issues State-of-the-Art Algorithms & Trends for Edge AI and Embedded Systems


๐Ÿ“‹ Table of Contents

๐Ÿš€ Getting Started

๐Ÿ”ฅ Core Topics

๐Ÿ› ๏ธ Frameworks & Tools

๐Ÿ“š Documentation

๐Ÿ“š Resources

๐ŸŽ“ Community


๐Ÿš€ Quick Start & Development

Python Hatch Tests Coverage Type

๐Ÿ“ฆ Installation

This project uses modern Python tooling with Hatch for dependency management and development workflows.

# Clone the repository
git clone https://github.com/umitkacar/ai-edge-computing-tiny-embedded.git
cd ai-edge-computing-tiny-embedded

# Install dependencies (using hatch)
pip install hatch

# Run tests
hatch run test

# Run full CI pipeline
hatch run ci

๐Ÿ› ๏ธ Development Setup

Modern Python Stack:

  • Build System: Hatch - Modern Python project manager
  • Linting: Ruff - Ultra-fast Python linter (100x faster than flake8)
  • Formatting: Black - The uncompromising code formatter
  • Type Checking: Mypy - Static type checker (strict mode)
  • Testing: Pytest - Comprehensive test framework
  • Security: Bandit - Security vulnerability scanner
  • Pre-commit: Automated quality checks on commit/push

Available Commands:

# Linting & Formatting
hatch run lint          # Run Ruff linter
hatch run format        # Format code with Black
hatch run format-check  # Check formatting without changes

# Type Checking
hatch run type-check    # Run Mypy strict type checking

# Testing
hatch run test                    # Run tests (sequential)
hatch run test-parallel           # Run tests with auto workers
hatch run test-parallel-cov       # Parallel tests with coverage

# Security
hatch run security      # Run Bandit security audit

# Complete CI Pipeline
hatch run ci           # Run all checks (format, lint, type-check, security, test)

๐Ÿ“Š Project Structure

ai-edge-computing-tiny-embedded/
โ”œโ”€โ”€ src/ai_edge_tinyml/          # Source code (src layout)
โ”‚   โ”œโ”€โ”€ __init__.py              # Package initialization
โ”‚   โ”œโ”€โ”€ quantization.py          # INT8/INT4/FP16 quantization
โ”‚   โ”œโ”€โ”€ model_optimizer.py       # Model optimization pipeline
โ”‚   โ”œโ”€โ”€ utils.py                 # Utility functions
โ”‚   โ””โ”€โ”€ py.typed                 # PEP 561 marker (typed package)
โ”œโ”€โ”€ tests/                       # Test suite (62 tests, 81.76% coverage)
โ”‚   โ”œโ”€โ”€ conftest.py              # Pytest configuration & fixtures
โ”‚   โ”œโ”€โ”€ test_quantization.py     # Quantization tests (21 tests)
โ”‚   โ”œโ”€โ”€ test_model_optimizer.py  # Optimizer tests (19 tests)
โ”‚   โ””โ”€โ”€ test_utils.py            # Utility tests (22 tests)
โ”œโ”€โ”€ pyproject.toml               # Project configuration (single source of truth)
โ”œโ”€โ”€ .pre-commit-config.yaml      # Pre-commit hooks configuration
โ”œโ”€โ”€ CHANGELOG.md                 # Detailed change history
โ”œโ”€โ”€ LESSONS-LEARNED.md           # Best practices & insights
โ”œโ”€โ”€ DEVELOPMENT.md               # Development guidelines
โ””โ”€โ”€ README.md                    # This file

โœ… Quality Assurance

This project maintains production-ready code quality:

Check Status Details
Ruff Linting โœ… PASS 50+ rules, zero errors
Black Formatting โœ… PASS Line length: 100
Mypy Type Check โœ… PASS Strict mode enabled
Bandit Security โœ… PASS 0 vulnerabilities
Test Suite โœ… PASS 62/62 tests passing
Code Coverage โœ… PASS 81.76% (exceeds 80%)
Pre-commit Hooks โœ… PASS 15+ automated checks

Test Results:

tests/test_quantization.py      21 passed
tests/test_model_optimizer.py   19 passed
tests/test_utils.py             22 passed
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
Total: 62 passed in 0.50s โœ…
Coverage: 81.76% (exceeds 80% threshold) โœ…

๐Ÿ”’ Security

  • Bandit Security Audit: Zero vulnerabilities detected
  • Type Safety: Full type annotations with mypy strict mode
  • Dependency Scanning: Automated security checks in CI
  • Pre-commit Hooks: Security validations before commit

๐Ÿ“š Documentation

  • CHANGELOG.md - Detailed version history and changes
  • LESSONS-LEARNED.md - Best practices, insights, and technical decisions
  • DEVELOPMENT.md - Comprehensive development guidelines
  • API Documentation: Auto-generated from Google-style docstrings

๐ŸŽฏ Features

Quantization Support:

  • โœ… INT8 Quantization (8-bit integers)
  • โœ… INT4 Quantization (4-bit integers)
  • โœ… FP16 Quantization (16-bit floats)
  • โœ… Dynamic Quantization
  • โœ… Symmetric & Asymmetric modes
  • โœ… Per-tensor & per-channel quantization

Model Optimization:

  • โœ… Weight quantization with 6 different modes
  • โœ… Compression ratio analysis
  • โœ… Model size calculation
  • โœ… Type-safe APIs with full annotations
  • โœ… Comprehensive error handling

Example Usage:

import numpy as np
from ai_edge_tinyml import Quantizer, QuantizationConfig, QuantizationMode

# Create quantization config
config = QuantizationConfig(
    mode=QuantizationMode.INT8,
    symmetric=True,
    per_channel=False
)

# Initialize quantizer
quantizer = Quantizer(config)

# Quantize weights
weights = np.random.randn(100, 100).astype(np.float32)
quantized = quantizer.quantize(weights)

# Dequantize for inference
dequantized = quantizer.dequantize(quantized)

# Calculate compression
from ai_edge_tinyml.utils import calculate_compression_ratio
ratio = calculate_compression_ratio(weights, quantized)
print(f"Compression ratio: {ratio:.2f}x")

๐Ÿ”ฅ SOTA Models & Algorithms (2024-2025)

AI Edge TinyML SOTA


๐ŸŽฏ Object Detection Models

๐Ÿฅ‡ YOLOv11 (YOLO11)

Release Status

๐Ÿš€ State-of-the-art real-time object detection with transformer-based improvements

โœจ Key Features:

  • โšก Transformer-based backbone with C3k2 blocks
  • ๐ŸŽฏ Partial Self-Attention (PSA) mechanism
  • ๐Ÿ”ฅ NMS-free training with dual label assignment
  • ๐Ÿ“‰ 25-40% lower latency vs YOLOv10
  • ๐Ÿ“Š 10-15% improvement in mAP
  • โšก 60+ FPS processing capability

๐Ÿ“š Resources:

๐Ÿ“– Ultralytics Docs โ†’ https://docs.ultralytics.com/models/
๐Ÿ“„ YOLO Evolution โ†’ https://arxiv.org/html/2510.09653v2

๐Ÿฅˆ YOLOv10

Release NMS

โšก Eliminates NMS for end-to-end real-time detection

๐Ÿ“Š Performance Metrics:

  • ๐Ÿ”ธ YOLOv10s: 1.8x faster than RT-DETR-R18
  • ๐Ÿ”ธ YOLOv10b: 46% less latency, 25% fewer parameters
  • ๐Ÿ”ธ mAP Range: 38.5 - 54.4

๐Ÿ“š Resources:

๐Ÿ“„ Paper โ†’ https://arxiv.org/pdf/2405.14458
๐Ÿ“– Docs โ†’ https://docs.ultralytics.com/models/yolov10/

๐Ÿค– RT-DETR & RT-DETRv2

Transformer Real-Time

๐ŸŽฏ First practical real-time detection transformer

Model AP Score FPS Device
RT-DETR 53.1% 108 NVIDIA T4
RT-DETRv2 >55% 108+ NVIDIA T4

๐Ÿ”— Resources:


๐Ÿ“ฑ Efficient Vision Models for Edge

graph LR
    A[๐Ÿ–ผ๏ธ Input Image] --> B[๐Ÿ“ฑ MobileNetV4]
    A --> C[โšก EfficientViT]
    B --> D[๐ŸŽฏ 87% Accuracy]
    C --> E[๐Ÿ”ฅ 3.8ms Latency]
    D --> F[๐Ÿ“ฒ Edge TPU]
    E --> F
    style A fill:#e1f5ff
    style B fill:#ffe1f5
    style C fill:#f5ffe1
    style D fill:#ffe1e1
    style E fill:#e1ffe1
    style F fill:#ffd700

๐Ÿ“ฑ MobileNetV4

ECCV Mobile

๐ŸŒ Universal efficient architecture for mobile ecosystem

๐ŸŽจ Innovations:

  • ๐Ÿ”น Universal Inverted Bottleneck (UIB) block
  • โšก Mobile MQA attention (39% speedup)
  • ๐ŸŽฏ Optimized NAS recipe
  • ๐Ÿ† 87% ImageNet accuracy @ 3.8ms (Pixel 8 EdgeTPU)

๐Ÿ“š Resources:

โšก EfficientViT

ViT 2024

๐Ÿง  Lightweight multi-scale attention for high-resolution tasks

โœจ Features:

  • ๐Ÿ”ธ Memory-efficient Vision Transformer
  • ๐Ÿ”ธ Cascaded group attention
  • ๐Ÿ”ธ Dense prediction tasks optimized
  • ๐Ÿ”ธ High-resolution image processing

๐Ÿค– Small Language Models (SLMs) for Edge

LLM Edge


๐Ÿง  Microsoft Phi-3

Microsoft

๐Ÿ“Š Variants:

Model: Phi-3-mini
Parameters: 3.8B
Context: Up to 128K tokens
Deployment: GPU, CPU, Mobile
Status: โœ… Production Ready

๐ŸŽฏ Optimized For:

  • ๐Ÿ’ป GPU acceleration
  • ๐Ÿ–ฅ๏ธ CPU inference
  • ๐Ÿ“ฑ Mobile deployment

๐Ÿ”— Resources:

๐Ÿฆ™ TinyLlama

TinyLlama

๐Ÿ“Š Specifications:

Parameters: 1.1B
Target: Mobile/Edge devices
Performance: High for size class
Year: 2024
Status: โœ… Active

โœจ Highlights:

  • ๐Ÿ”ธ Compact architecture
  • ๐Ÿ”ธ Edge-optimized
  • ๐Ÿ”ธ Strong performance/size ratio

๐ŸŒŸ Google Gemini Nano

Google

๐Ÿ“ฑ On-device AI for Smartphones

Variants:

  • ๐Ÿ“Š 1.8B parameters (lightweight)
  • ๐Ÿ“Š 3.25B parameters (standard)

๐ŸŽฏ Capabilities:

  • โœ… Context-aware reasoning
  • โœ… Real-time translation
  • โœ… Text summarization
  • โœ… Edge-optimized for phones/IoT

๐Ÿฆ™ Meta Llama 3.2

Meta

๐Ÿ–ผ๏ธ Edge AI & Vision Capabilities

Features:

  • โšก Edge deployment optimized
  • ๐Ÿ‘๏ธ Vision-language capabilities
  • ๐Ÿ“ฑ Mobile-friendly variants
  • ๐Ÿ”ฅ Latest architecture

๐Ÿ”— Resources:


๐Ÿ“ท MobileVLM

VLM

๐ŸŽจ Efficient vision-language model for mobile devices

Specifications:

  • ๐Ÿ”น mobileLLaMA: 2.7B parameters
  • ๐Ÿ”น Trained from scratch on open datasets
  • ๐Ÿ”น Fully optimized for mobile deployment
  • ๐Ÿ”น Vision + Language capabilities

โšก State Space Models - Efficient Transformers

SSM Efficiency


๐Ÿ Mamba

Mamba

โšก Linear-time sequence modeling with selective state spaces

๐Ÿš€ Performance Highlights:

Metric Performance
Throughput 5x higher than Transformers
Scaling Linear in sequence length
Comparison Mamba-3B > Transformers (same size)
Power Matches Transformers 2x its size

๐Ÿ“Š Advantages:

+ โœ… Linear time complexity
+ โœ… 5x throughput improvement
+ โœ… Efficient long sequences
+ โœ… Lower memory footprint
- โŒ Newer architecture (less tested)

๐Ÿ“š Resources:

๐Ÿ“ฑ eMamba

eMamba

๐Ÿ”ง Edge-optimized Mamba acceleration framework

โœจ Features:

Design: End-to-end hardware acceleration
Target: Edge platforms
Complexity: Linear time
Status: 2024 Release

๐ŸŽฏ Optimizations:

  • ๐Ÿ”น Hardware-aware design
  • ๐Ÿ”น Edge platform specific
  • ๐Ÿ”น Leverages linear complexity
  • ๐Ÿ”น Memory efficient

๐Ÿ“š Resources:


๐Ÿš€ Inference Frameworks & Runtimes

Inference Runtime


โšก TensorRT-LLM

NVIDIA

๐Ÿ† High-performance LLM inference on NVIDIA GPUs

๐Ÿ“Š Performance:

+ 70% faster than llama.cpp on RTX 4090
+ State-of-the-art optimizations
+ Quality maintained across precisions

โœจ Features:

  • ๐Ÿ”ธ Python & C++ API
  • ๐Ÿ”ธ Multi-precision support
  • ๐Ÿ”ธ Advanced kernel optimization
  • ๐Ÿ”ธ Production-grade quality

๐Ÿ”— Resources:

๐Ÿ“„ vLLM

vLLM

๐Ÿ’ก High-throughput LLM serving with PagedAttention

๐ŸŽฏ Innovations:

  • โšก PagedAttention memory management
  • ๐Ÿ”ธ Optimized KV cache handling
  • ๐ŸŒ Multi-platform support

๐Ÿ–ฅ๏ธ Supported Hardware:

AMD: GPU support
Google: TPU support
AWS: Inferentia support
Base: PyTorch

๐Ÿ”— Resources:

๐Ÿฆ™ ExecuTorch

Meta

๐Ÿ“ฑ Efficient LLM execution on edge devices

Features:

  • ๐Ÿ”น Lightweight edge runtime
  • ๐Ÿ”น Static memory planning
  • ๐Ÿ”น Multi-platform support
  • ๐Ÿ”น TorchAO quantization

๐Ÿ’ป Hardware Support:

  • โœ… CPU
  • โœ… GPU
  • โœ… AI Accelerators
  • โœ… Mobile devices

๐Ÿ”— Resources:

๐Ÿ’ป llama.cpp

llama.cpp

โšก CPU-optimized LLM inference

Advantages:

+ โœ… Lower memory usage
+ โœ… No GPU required
+ โœ… Fast generation
+ โœ… Cross-platform
+ โœ… Wide model support

๐Ÿ”— Comparison:


๐Ÿ”ง Model Compression & Optimization

Compression Quantization


๐Ÿ“‰ Advanced Quantization Techniques

๐Ÿ† AWQ

Award

Activation-aware Weight Quantization

๐ŸŽฏ MIT HAN Lab Innovation

Key Concept:

# Not all weights are equal!
if is_salient(weight):
    skip_quantization()
else:
    quantize_weight()

Features:

  • โšก Protects critical weights
  • ๐ŸŽฏ Activation-aware
  • ๐Ÿ”ฅ State-of-the-art results

๐Ÿ”— Resources:

๐Ÿ’Ž GPTQ

GPTQ

GPU-Focused Quantization

Features:

  • ๐Ÿ”ธ Row-wise quantization
  • ๐Ÿ”ธ Hessian optimization
  • ๐Ÿ”ธ GPU inference focused
  • ๐Ÿ”ธ 175B models supported

Achievements:

Models: BLOOM, OPT-175B
Precision: 4-bit
Platform: GPU optimized

๐Ÿ”ฌ QLoRA

QLoRA

Efficient Fine-tuning

Innovations:

  • โœจ 4-bit NormalFloat (NF4)
  • โœจ Double quantization
  • โœจ LoRA adapters
  • โœจ Single GPU fine-tuning

Capability:

+ Fine-tune 65B model
+ On single GPU
+ Maintain quality

๐Ÿ†• Unsloth Dynamic 4-bit

Latest

๐Ÿ”ฅ Latest quantization innovation

Features:

  • Built on BitsandBytes
  • Dynamic parameter quantization
  • Per-parameter optimization

๐Ÿ“š Comprehensive Guides:


๐Ÿ”ฌ Neural Architecture Search (NAS)

NAS

๐Ÿค– Automate neural network architecture design

๐ŸŽฏ Once-for-All (OFA)

Concept: Train once, deploy everywhere

graph TD
    A[๐ŸŒ Supernet Training] --> B[๐Ÿ“ฆ Weight Sharing]
    B --> C[๐Ÿ“ฑ Mobile]
    B --> D[๐Ÿ’ป Desktop]
    B --> E[โšก Edge]
    style A fill:#e1f5ff
    style B fill:#ffe1f5
    style C fill:#f5ffe1
    style D fill:#ffe1e1
    style E fill:#ffd700

Features:

  • ๐Ÿ”น Weight-sharing supernetwork
  • ๐Ÿ”น Represents any architecture in search space
  • ๐Ÿ”น Massive computational savings
  • ๐Ÿ”น Applied to ImageNet with ProxylessNAS & MobileNetV3

๐Ÿ”— Resources:


๐ŸŽ“ Knowledge Distillation & Pruning

๐Ÿ”ฌ TinyBERT

TinyBERT

๐Ÿ“š Two-stage distillation approach

Performance Metrics:

Accuracy: 96.8% of BERT-base
Size: 7.5x smaller (4 layers)
Energy: Lowest variability (0.1032 kWh SD)
Stages: Task-agnostic + Task-specific

Advantages:

  • โœ… Dual-stage distillation
  • โœ… Ultra-low energy variability
  • โœ… Compact architecture
  • โœ… High performance retention

๐Ÿ“– DistilBERT

DistilBERT

โšก Single-phase task-agnostic distillation

Performance Metrics:

Accuracy: 97% of BERT
Size Reduction: 40% smaller
Speed: 60% faster
Use Case: General-purpose

Recent Research (2025):

  • ๐Ÿ”ธ 32% energy reduction with pruning
  • ๐Ÿ”ธ Iterative distillation + adaptive pruning
  • ๐Ÿ”ธ Nature Scientific Reports

๐Ÿ“š Resources:


๐ŸŽฏ TinyML & MCU-specific Advances

TinyML MIT


๐Ÿง  MCUNet Series - MIT HAN Lab

๐Ÿ“ฑ MCUNetV1

V1

Foundation:

  • ๐Ÿ”ธ Neural architecture for MCUs
  • ๐Ÿ”ธ Co-designed model + inference engine
  • ๐Ÿ”ธ Ultra-low memory footprint

๐Ÿš€ MCUNetV2

V2

Achievements:

ImageNet: 71.8% accuracy
Visual Wake: >90% (32kB SRAM)
Capability: Object detection
Platform: Tiny devices

โšก MCUNetV3

V3

Latest:

  • ๐Ÿ”ธ Enhanced efficiency
  • ๐Ÿ”ธ State-of-the-art MCU AI
  • ๐Ÿ”ธ Production ready

๐ŸŽ“ Additional MCU Tools

๐Ÿ”ง TinyTL

  • Tiny transfer learning for MCUs
  • On-device learning capabilities
  • Minimal resource overhead

โš™๏ธ PockEngine

  • Inference engine optimization
  • MCU-specific acceleration
  • Memory-efficient execution

๐Ÿ“š Resources:


๐Ÿ”ฌ TinyDL (Tiny Deep Learning)

TinyDL

๐ŸŽฏ Evolution from TinyML to deep learning on edge

Focus Areas:

  • ๐Ÿ”น Deep learning on ultra-constrained hardware
  • ๐Ÿ”น Power consumption in mW range
  • ๐Ÿ”น On-device sensor analytics
  • ๐Ÿ”น Real-time inference

๐Ÿ“„ Resources:


๐Ÿ”ฉ Hardware Acceleration & Platforms

Hardware Edge


๐Ÿ–ฅ๏ธ Edge AI Platforms

๐ŸŸข NVIDIA Jetson Orin Nano Super

NVIDIA

Specifications:

Compute: 67 INT8 TOPS
Performance: 1.7x vs previous Orin
Price: $249
Release: Late 2024
Status: โœ… Available

Features:

  • โšก Generative AI optimized
  • ๐ŸŽฏ Edge AI development kit
  • ๐Ÿ’ฐ Affordable price point

๐Ÿ”ท Edge TPU & Neural Accelerators

Hardware Platforms:

Google

  • Google Pixel EdgeTPU
  • Coral Dev Board

Apple

  • Apple Neural Engine
  • A-series chips

Generic

  • Specialized NPUs
  • Custom ASICs

๐Ÿ“ฑ Mobile Deployment Targets

Platform Architecture Use Case
๐Ÿ”ง ARM CPUs ARM Cortex General compute
๐Ÿ“ก Mobile DSPs Qualcomm/MediaTek Signal processing
๐ŸŽฎ Mobile GPUs Mali/Adreno Graphics + AI
๐Ÿง  NPUs Custom ASICs Neural processing

๐Ÿ› ๏ธ Implementation Resources & Tools

ONNX TensorRT


๐Ÿ”ท ONNX Runtime

Cross-platform inference with ONNX models

๐Ÿ“š Documentation & Tutorials

๐Ÿ”ง Compatibility

๐Ÿ’ป Example Implementations

๐Ÿ“ฆ Model Repositories


๐Ÿ“‰ ONNX Runtime Quantization

Quantization

Tools & Resources:


๐ŸŽฏ YOLO Implementations

๐Ÿ”ฅ Click to expand YOLO implementations


๐ŸŸฃ YOLO-NAS with ONNX

๐ŸŸข YOLO + TensorRT (Detection, Pose, Segmentation)

๐Ÿ”ต YOLO + ONNXRuntime (All Tasks)

๐ŸŒ Community Resources


โšก TensorRT

TensorRT

๐Ÿš€ NVIDIA's high-performance deep learning inference optimizer

Resources:


๐ŸŒ Edge Deployment Frameworks

Deployment Frameworks


๐Ÿš€ FastDeploy - PaddlePaddle

PaddlePaddle

๐Ÿ“ฆ Easy-to-use deployment toolbox for AI models

Resources:


๐Ÿ’Ž DeepSparse & SparseML - Neural Magic

Neural Magic

๐Ÿ–ฅ๏ธ CPU-optimized inference with sparsity

Features:

  • โšก CPU inference acceleration
  • ๐Ÿ”ธ Sparsity-aware optimization
  • ๐Ÿ“Š YOLOv5 CPU benchmarks

Resources:

๐Ÿ“ฑ NCNN - Tencent

Tencent

๐ŸŽฏ High-performance neural network inference for mobile

Resources:


๐Ÿ”ง MACE - Xiaomi

Xiaomi

๐Ÿค– Mobile AI Compute Engine

Resources:


๐ŸŽ CoreML - Apple

Apple

๐ŸŽจ Machine learning framework for iOS/macOS

๐Ÿ“ฆ Click to expand CoreML resources


๐ŸŽจ Model Collections

๐Ÿ› ๏ธ Tools & Documentation

๐ŸŽจ Stable Diffusion on CoreML


โš™๏ธ Compilers & Low-Level Frameworks

Compilers Optimization


๐Ÿ”ง TVM - Apache

TVM

๐ŸŽฏ End-to-end deep learning compiler stack

Resources:


๐Ÿ”จ LLVM

LLVM

โš™๏ธ Compiler infrastructure project

Resources:


โšก XNNPack - Google

Google

๐Ÿš€ High-efficiency floating-point neural network operators

Resources:

๐Ÿ”ท ARM-NN

ARM

๐Ÿ’ช Inference engine for ARM platforms

Resources:


๐Ÿง  CMSIS-NN

CMSIS

๐Ÿ“ฑ Efficient neural network kernels for ARM Cortex-M

Resources:


๐Ÿ“ฑ Samsung ONE

Samsung

๐Ÿ”ง On-device Neural Engine compiler

Resources:


๐Ÿ’ผ Industry & Commercial Solutions

Industry


๐Ÿš€ Deeplite

Deeplite

๐ŸŽฏ AI-Driven Optimizer for Deep Neural Networks

Focus:

โšก
Faster
Inference
๐Ÿ“ฆ
Smaller
Models
๐Ÿ”‹
Energy
Efficient
โ˜๏ธ
Cloud to
Edge
๐ŸŽฏ
Maintain
Accuracy

๐Ÿ”— Resources:


๐Ÿ”ง Utility Frameworks & Tools

Tools


๐Ÿ‘๏ธ OpenCV

OpenCV

๐Ÿ“ท Computer vision library with C++ support

Resources:

๐ŸŽฌ VQRF - Video Compression

VQRF

๐Ÿ“น Vector Quantized Radiance Fields

Resources:


๐Ÿ–ผ๏ธ Additional Model Architectures

Models


๐ŸŽฏ PP-PicoDet

PicoDet

๐Ÿ“ฑ Lightweight real-time object detector for mobile

Resources:

๐Ÿ”ฌ EtinyNet

EtinyNet

๐ŸŽฏ Extremely tiny network for TinyML

Resources:

TinyML Architecture


๐Ÿง  Computing Architectures & APIs

Computing


ARM

Mobile &
Embedded

RISC-V

Open-Source
ISA

CUDA

NVIDIA
GPU

Metal

Apple
GPU

OpenCL

Cross-
Platform

Vulkan

Graphics &
Compute


๐Ÿ“š Research Papers & Academic Resources

Research 2024-2025


๐Ÿ“– Foundational Surveys (2024-2025)

๐Ÿ” Click to expand research papers


๐ŸŒ Edge Computing & Deep Learning

๐Ÿ”ฌ TinyML Specific

โšก State Space Models & Efficient Architectures

๐Ÿ‘๏ธ Vision Models

๐Ÿ”ง Model Compression & Optimization

๐Ÿ“š Collections


๐ŸŽ“ Contributing & Community

Community Contributions


This repository serves as a comprehensive resource for AI edge computing and TinyML practitioners.

Contributions, updates, and corrections are welcome! ๐Ÿš€


๐Ÿ“Š Repository Stats

Last Commit Contributors Issues


๐Ÿท๏ธ Keywords

TinyML โ€ข Edge AI โ€ข Embedded ML โ€ข Model Compression โ€ข Quantization โ€ข Neural Architecture Search โ€ข YOLO โ€ข MobileNet โ€ข Transformer โ€ข State Space Models โ€ข ONNX Runtime โ€ข TensorRT โ€ข Inference Optimization โ€ข MCU โ€ข IoT โ€ข Real-Time AI


๐Ÿ“… Last Updated

January 2025