๐ AI Edge Computing & TinyML
Comprehensive Guide to State-of-the-Art Edge AI
๐ Latest Update: January 2025
Production-Ready Python Implementation with modern tooling (Hatch, Ruff, Mypy) 62/62 Tests Passing โข 81.76% Coverage โข Zero Security Issues State-of-the-Art Algorithms & Trends for Edge AI and Embedded Systems
๐ Table of Contents
๐ Getting Started |
๐ฅ Core Topics |
๐ ๏ธ Frameworks & Tools |
๐ Documentation |
๐ Resources |
๐ Community |
๐ Quick Start & Development
๐ฆ Installation
This project uses modern Python tooling with Hatch for dependency management and development workflows.
# Clone the repository
git clone https://github.com/umitkacar/ai-edge-computing-tiny-embedded.git
cd ai-edge-computing-tiny-embedded
# Install dependencies (using hatch)
pip install hatch
# Run tests
hatch run test
# Run full CI pipeline
hatch run ci
๐ ๏ธ Development Setup
Modern Python Stack:
- Build System: Hatch - Modern Python project manager
- Linting: Ruff - Ultra-fast Python linter (100x faster than flake8)
- Formatting: Black - The uncompromising code formatter
- Type Checking: Mypy - Static type checker (strict mode)
- Testing: Pytest - Comprehensive test framework
- Security: Bandit - Security vulnerability scanner
- Pre-commit: Automated quality checks on commit/push
Available Commands:
# Linting & Formatting
hatch run lint # Run Ruff linter
hatch run format # Format code with Black
hatch run format-check # Check formatting without changes
# Type Checking
hatch run type-check # Run Mypy strict type checking
# Testing
hatch run test # Run tests (sequential)
hatch run test-parallel # Run tests with auto workers
hatch run test-parallel-cov # Parallel tests with coverage
# Security
hatch run security # Run Bandit security audit
# Complete CI Pipeline
hatch run ci # Run all checks (format, lint, type-check, security, test)
๐ Project Structure
ai-edge-computing-tiny-embedded/
โโโ src/ai_edge_tinyml/ # Source code (src layout)
โ โโโ __init__.py # Package initialization
โ โโโ quantization.py # INT8/INT4/FP16 quantization
โ โโโ model_optimizer.py # Model optimization pipeline
โ โโโ utils.py # Utility functions
โ โโโ py.typed # PEP 561 marker (typed package)
โโโ tests/ # Test suite (62 tests, 81.76% coverage)
โ โโโ conftest.py # Pytest configuration & fixtures
โ โโโ test_quantization.py # Quantization tests (21 tests)
โ โโโ test_model_optimizer.py # Optimizer tests (19 tests)
โ โโโ test_utils.py # Utility tests (22 tests)
โโโ pyproject.toml # Project configuration (single source of truth)
โโโ .pre-commit-config.yaml # Pre-commit hooks configuration
โโโ CHANGELOG.md # Detailed change history
โโโ LESSONS-LEARNED.md # Best practices & insights
โโโ DEVELOPMENT.md # Development guidelines
โโโ README.md # This file
โ Quality Assurance
This project maintains production-ready code quality:
| Check | Status | Details |
|---|---|---|
| Ruff Linting | โ PASS | 50+ rules, zero errors |
| Black Formatting | โ PASS | Line length: 100 |
| Mypy Type Check | โ PASS | Strict mode enabled |
| Bandit Security | โ PASS | 0 vulnerabilities |
| Test Suite | โ PASS | 62/62 tests passing |
| Code Coverage | โ PASS | 81.76% (exceeds 80%) |
| Pre-commit Hooks | โ PASS | 15+ automated checks |
Test Results:
tests/test_quantization.py 21 passed
tests/test_model_optimizer.py 19 passed
tests/test_utils.py 22 passed
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total: 62 passed in 0.50s โ
Coverage: 81.76% (exceeds 80% threshold) โ
๐ Security
- Bandit Security Audit: Zero vulnerabilities detected
- Type Safety: Full type annotations with mypy strict mode
- Dependency Scanning: Automated security checks in CI
- Pre-commit Hooks: Security validations before commit
๐ Documentation
- CHANGELOG.md - Detailed version history and changes
- LESSONS-LEARNED.md - Best practices, insights, and technical decisions
- DEVELOPMENT.md - Comprehensive development guidelines
- API Documentation: Auto-generated from Google-style docstrings
๐ฏ Features
Quantization Support:
- โ INT8 Quantization (8-bit integers)
- โ INT4 Quantization (4-bit integers)
- โ FP16 Quantization (16-bit floats)
- โ Dynamic Quantization
- โ Symmetric & Asymmetric modes
- โ Per-tensor & per-channel quantization
Model Optimization:
- โ Weight quantization with 6 different modes
- โ Compression ratio analysis
- โ Model size calculation
- โ Type-safe APIs with full annotations
- โ Comprehensive error handling
Example Usage:
import numpy as np
from ai_edge_tinyml import Quantizer, QuantizationConfig, QuantizationMode
# Create quantization config
config = QuantizationConfig(
mode=QuantizationMode.INT8,
symmetric=True,
per_channel=False
)
# Initialize quantizer
quantizer = Quantizer(config)
# Quantize weights
weights = np.random.randn(100, 100).astype(np.float32)
quantized = quantizer.quantize(weights)
# Dequantize for inference
dequantized = quantizer.dequantize(quantized)
# Calculate compression
from ai_edge_tinyml.utils import calculate_compression_ratio
ratio = calculate_compression_ratio(weights, quantized)
print(f"Compression ratio: {ratio:.2f}x")
๐ฅ SOTA Models & Algorithms (2024-2025)
๐ฏ Object Detection Models
๐ฅ YOLOv11 (YOLO11)
โจ Key Features:
๐ Resources:
|
๐ฅ YOLOv10
๐ Performance Metrics:
๐ Resources:
|
๐ค RT-DETR & RT-DETRv2
๐ฏ First practical real-time detection transformer
| Model | AP Score | FPS | Device |
|---|---|---|---|
| RT-DETR | 53.1% | 108 | NVIDIA T4 |
| RT-DETRv2 | >55% | 108+ | NVIDIA T4 |
๐ Resources:
๐ฑ Efficient Vision Models for Edge
graph LR
A[๐ผ๏ธ Input Image] --> B[๐ฑ MobileNetV4]
A --> C[โก EfficientViT]
B --> D[๐ฏ 87% Accuracy]
C --> E[๐ฅ 3.8ms Latency]
D --> F[๐ฒ Edge TPU]
E --> F
style A fill:#e1f5ff
style B fill:#ffe1f5
style C fill:#f5ffe1
style D fill:#ffe1e1
style E fill:#e1ffe1
style F fill:#ffd700
๐ฑ MobileNetV4
๐จ Innovations:
๐ Resources:
|
โก EfficientViT
โจ Features:
|
๐ค Small Language Models (SLMs) for Edge
๐ง Microsoft Phi-3๐ Variants:
๐ฏ Optimized For:
๐ Resources: |
๐ฆ TinyLlama๐ Specifications:
โจ Highlights:
|
๐ Google Gemini Nano๐ฑ On-device AI for Smartphones Variants:
๐ฏ Capabilities:
|
๐ฆ Meta Llama 3.2๐ผ๏ธ Edge AI & Vision Capabilities Features:
๐ Resources: |
๐ท MobileVLM
๐จ Efficient vision-language model for mobile devices
Specifications:
- ๐น mobileLLaMA: 2.7B parameters
- ๐น Trained from scratch on open datasets
- ๐น Fully optimized for mobile deployment
- ๐น Vision + Language capabilities
โก State Space Models - Efficient Transformers
๐ Mamba
๐ Performance Highlights:
๐ Advantages:
๐ Resources:
|
๐ฑ eMamba
โจ Features:
๐ฏ Optimizations:
๐ Resources:
|
๐ Inference Frameworks & Runtimes
โก TensorRT-LLM
๐ Performance:
โจ Features:
๐ Resources: |
๐ vLLM
๐ฏ Innovations:
๐ฅ๏ธ Supported Hardware:
๐ Resources: |
๐ฆ ExecuTorch
Features:
๐ป Hardware Support:
๐ Resources: |
๐ป llama.cpp
Advantages:
๐ Comparison: |
๐ง Model Compression & Optimization
๐ Advanced Quantization Techniques
๐ AWQActivation-aware Weight Quantization
Key Concept:
Features:
๐ Resources: |
๐ GPTQGPU-Focused Quantization Features:
Achievements:
|
๐ฌ QLoRAEfficient Fine-tuning Innovations:
Capability:
|
๐ Unsloth Dynamic 4-bit
๐ฅ Latest quantization innovation
Features:
- Built on BitsandBytes
- Dynamic parameter quantization
- Per-parameter optimization
๐ Comprehensive Guides:
- ๐ Quantization Comparison
- ๐ GPTQ vs GGUF vs AWQ
๐ฌ Neural Architecture Search (NAS)
๐ค Automate neural network architecture design
๐ฏ Once-for-All (OFA)
Concept: Train once, deploy everywhere
graph TD
A[๐ Supernet Training] --> B[๐ฆ Weight Sharing]
B --> C[๐ฑ Mobile]
B --> D[๐ป Desktop]
B --> E[โก Edge]
style A fill:#e1f5ff
style B fill:#ffe1f5
style C fill:#f5ffe1
style D fill:#ffe1e1
style E fill:#ffd700
Features:
- ๐น Weight-sharing supernetwork
- ๐น Represents any architecture in search space
- ๐น Massive computational savings
- ๐น Applied to ImageNet with ProxylessNAS & MobileNetV3
๐ Resources:
๐ Knowledge Distillation & Pruning
๐ฌ TinyBERT
Performance Metrics:
Advantages:
|
๐ DistilBERT
Performance Metrics:
Recent Research (2025):
|
๐ Resources:
๐ฏ TinyML & MCU-specific Advances
๐ง MCUNet Series - MIT HAN Lab
๐ฑ MCUNetV1Foundation:
|
๐ MCUNetV2Achievements:
|
โก MCUNetV3Latest:
|
๐ Additional MCU Tools
|
๐ง TinyTL
|
โ๏ธ PockEngine
|
๐ Resources:
- ๐ MCUNet Official
- ๐ป MCUNet GitHub
- ๐ TinyML Projects
๐ฌ TinyDL (Tiny Deep Learning)
๐ฏ Evolution from TinyML to deep learning on edge
Focus Areas:
- ๐น Deep learning on ultra-constrained hardware
- ๐น Power consumption in mW range
- ๐น On-device sensor analytics
- ๐น Real-time inference
๐ Resources:
๐ฉ Hardware Acceleration & Platforms
๐ฅ๏ธ Edge AI Platforms
๐ข NVIDIA Jetson Orin Nano SuperSpecifications:
Features:
|
๐ท Edge TPU & Neural AcceleratorsHardware Platforms:
|
๐ฑ Mobile Deployment Targets
| Platform | Architecture | Use Case |
|---|---|---|
| ๐ง ARM CPUs | ARM Cortex | General compute |
| ๐ก Mobile DSPs | Qualcomm/MediaTek | Signal processing |
| ๐ฎ Mobile GPUs | Mali/Adreno | Graphics + AI |
| ๐ง NPUs | Custom ASICs | Neural processing |
๐ ๏ธ Implementation Resources & Tools
๐ท ONNX Runtime
Cross-platform inference with ONNX models
๐ Documentation & Tutorials
๐ง Compatibility
|
๐ป Example Implementations
๐ฆ Model Repositories
|
๐ ONNX Runtime Quantization
Tools & Resources:
- ๐ง Quantization Tools
- ๐ Float16 Optimization
- ๐ก Quantization Examples
๐ฏ YOLO Implementations
๐ฅ Click to expand YOLO implementations
๐ฃ YOLO-NAS with ONNX
- ๐ป YOLO-NAS ONNXRuntime
๐ข YOLO + TensorRT (Detection, Pose, Segmentation)
- โก YOLOv8-TensorRT-CPP
- ๐ง TensorRT C++ API
- ๐ YOLOv8-TensorRT (Python + C++)
- ๐คธ YOLO Pose C++
- ๐ TensorRT Samples
- ๐บ YOLOv8 TensorRT Tutorial
๐ต YOLO + ONNXRuntime (All Tasks)
- ๐ป YOLOv8-ONNX-CPP
- ๐คธ YOLOv8 Pose Implementation
- โก YOLOv8 TensorRT Pose
- ๐ง YOLO-ONNXRuntime-CPP
- ๐ท YOLOv8-OpenCV-ONNXRuntime-CPP
- ๐ Ultralytics YOLOv8 C++
- ๐ฏ YOLOv6-OpenCV-ONNXRuntime
- ๐ YOLOv5 Pose OpenCV
๐ Community Resources
- ๐จโ๐ป hpc203 Repositories
- ๐ฌ YOLO Issue Discussions
- ๐ YOLOv5 Fixed Bugs
- ๐จ๐ณ Chinese Tutorial
- ๐ฆ ONNX Runtime Install Guide
โก TensorRT
๐ NVIDIA's high-performance deep learning inference optimizer
Resources:
๐ Edge Deployment Frameworks
๐ FastDeploy - PaddlePaddle
Resources:
๐ DeepSparse & SparseML - Neural Magic
Features:
Resources:
|
๐ฑ NCNN - Tencent
Resources:
๐ง MACE - Xiaomi
Resources:
|
๐ CoreML - Apple
๐จ Machine learning framework for iOS/macOS
๐ฆ Click to expand CoreML resources
๐จ Model Collections
- ๐ฏ Semantic Segmentation CoreML
- ๐ CoreML Models Collection
- โญ Awesome CoreML Models
- ๐ง Awesome CoreML Models 2
- ๐ฌ RobustVideoMatting
๐ ๏ธ Tools & Documentation
- ๐ PyTorch to CoreML
- ๐ง CoreML Helpers
- ๐ Apple ML API
- ๐ CoreML Performance Tool
๐จ Stable Diffusion on CoreML
- ๐ฌ Apple ML-4M
- ๐ฏ Apple ML Stable Diffusion
- ๐ฆ Stable Diffusion 2 Base
- ๐ Stability AI SD
- ๐ Stable Diffusion v1.4
- ๐ฌ RunwayML SD
- ๐ผ๏ธ Automatic1111 WebUI
โ๏ธ Compilers & Low-Level Frameworks
๐ง TVM - Apache
Resources: ๐จ LLVM
Resources: โก XNNPack - Google
Resources: |
๐ท ARM-NN
Resources: ๐ง CMSIS-NN
Resources: ๐ฑ Samsung ONE
Resources: |
๐ผ Industry & Commercial Solutions
๐ Deeplite
๐ฏ AI-Driven Optimizer for Deep Neural Networks
Focus:
| โก Faster Inference |
๐ฆ Smaller Models |
๐ Energy Efficient |
โ๏ธ Cloud to Edge |
๐ฏ Maintain Accuracy |
๐ Resources:
๐ง Utility Frameworks & Tools
๐๏ธ OpenCV
Resources:
|
๐ฌ VQRF - Video Compression
Resources: |
๐ผ๏ธ Additional Model Architectures
๐ฏ PP-PicoDet
Resources: |
๐ฌ EtinyNet
Resources: |

๐ง Computing Architectures & APIs
|
Mobile & |
Open-Source |
NVIDIA |
Apple |
Cross- |
Graphics & |
๐ Research Papers & Academic Resources
๐ Foundational Surveys (2024-2025)
๐ Click to expand research papers
๐ Edge Computing & Deep Learning
- ๐ Deep Learning With Edge Computing: A Review
- ๐ Convergence of Edge Computing and Deep Learning
- ๐ Machine Learning at the Network Edge
- ๐ Edge Deep Learning in CV & Medical Diagnostics
๐ฌ TinyML Specific
- ๐ From Tiny ML to Tiny DL: A Survey (2024)
- ๐ EtinyNet: Extremely Tiny Network
- ๐ Ultra-low Power TinyML System
โก State Space Models & Efficient Architectures
- ๐ Mamba: Linear-Time Sequence Modeling
- ๐ Mamba-360: Survey of SSMs
- ๐ eMamba: Efficient Edge Acceleration
๐๏ธ Vision Models
- ๐ MobileNetV4 (ECCV 2024)
- ๐ ViT for Mobile/Edge Devices
- ๐ YOLO Evolution: v5 to YOLO26
- ๐ YOLOv10: Real-Time Detection
๐ง Model Compression & Optimization
- ๐ Model Compression for Carbon Efficient AI (2025)
- ๐ NAS Systematic Review (2024)
- ๐ Advances in Neural Architecture Search
๐ Collections
๐ Contributing & Community
This repository serves as a comprehensive resource for AI edge computing and TinyML practitioners.
Contributions, updates, and corrections are welcome! ๐
๐ Repository Stats
๐ท๏ธ Keywords
TinyML โข Edge AI โข Embedded ML โข Model Compression โข Quantization โข Neural Architecture Search โข YOLO โข MobileNet โข Transformer โข State Space Models โข ONNX Runtime โข TensorRT โข Inference Optimization โข MCU โข IoT โข Real-Time AI
๐ Last Updated
January 2025