Home
Softono
Awesome-On-Device-AI-Systems

Awesome-On-Device-AI-Systems

Open source
158
Stars
12
Forks
2
Issues
6
Watchers
2 weeks
Last Commit

About Awesome-On-Device-AI-Systems

Awesome On-Device AI Systems is a curated repository focused on efficient on-device AI inference for mobile and edge devices. It serves as a bridge between academic systems research and practical engineering deployment, covering optimization techniques for machine learning models such as large language models, vision-language models, and vision transformers running on resource-constrained hardware. The repository is organized into two main sections. The first covers inference engines and runtimes, including general purpose frameworks like LiteRT, ExecuTorch, ONNX Runtime, MNN, and NCNN, along with vendor specific SDKs for platforms such as Qualcomm Snapdragon NPUs, Apple Core ML, NVIDIA TensorRT, Intel OpenVINO, and MediaTek NeuroPilot. It also features LLM and generative AI specialized engines like llama.cpp, MLC LLM, TensorRT-LLM, mllm, and MLX. The second section compiles research papers grouped into topics including LLM inference on mobile SoCs, processor characterization and optimization, compiler-based

Platforms

Web Self-hosted

Links

Awesome On-Device AI Systems Awesome

A curated list of efficient on-device AI systems, including practical inference engines, benchmarks, and state-of-the-art research papers for mobile and edge devices.

This repository bridges the gap between Systems Research (academic papers) and Practical Deployment (engineering frameworks), focusing on optimizing ML models (e.g., LLM/VLMs, ViTs, etc.) on resource-constrained hardware.

đź“‚ Table of Contents

🚀 Inference Engines

Frameworks and runtimes designed for deploying models on edge devices.

General ML Workloads

  • LiteRT (formerly TensorFlow Lite) - Google's framework for on-device inference.
  • ExecuTorch - PyTorch’s end-to-end solution for enabling on-device AI.
  • ONNX Runtime - Cross-platform inference engine for ONNX models.
  • MNN - Lightweight deep learning framework by Alibaba.
  • NCNN - High-performance NN inference framework by Tencent.

Vendor-Specific SDKs

  • Qualcomm QNN - Qualcomm AI Stack for Snapdragon NPUs/DSPs.
  • Apple Core ML - Framework to integrate ML models into iOS/macOS apps.
  • FluidAudio - Local audio AI SDK for Apple platforms with ASR, speaker diarization, VAD, and TTS optimized for Apple Neural Engine.
  • NVIDIA TensorRT - SDK for high-performance deep learning inference on NVIDIA GPUs (including Jetson).
  • Intel OpenVINO - Toolkit for optimizing and deploying AI inference on Intel hardware (CPU/GPU/NPU).
  • MediaTek NeuroPilot - AI ecosystem and SDK for MediaTek NPUs.

LLM & GenAI Specialized

  • llama.cpp - LLM inference in C/C++ with minimal dependencies.
  • MLC LLM - Universal solution for deploying LLMs on any hardware (based on TVM).
  • TensorRT-LLM - NVIDIA GPU-optimized LLM inference library, relevant for Jetson-class edge devices.
  • mllm - A fast and lightweight LLM inference engine for mobile and edge devices.
  • MLX LM - LLM inference and fine-tuning toolkit built on MLX for Apple silicon.
  • OmniInfer - High-performance, on-device VLM inference with hybrid NPU acceleration.
  • RunAnywhere - Open-source SDK for running LLMs and multimodal models on-device across iOS, Android, and cross-platform apps.
  • Off Grid - Open-source iOS/Android app running LLMs (Llama, Qwen, Gemma, Phi, DeepSeek) entirely on-device via llama.cpp. Includes voice (whisper.cpp), vision, on-device image generation, and tool calling.

📝 Research Papers

Note: Some of the works are designed for inference acceleration on cloud/server infrastructure, which has much higher computational resources, but I also include them here if they can be potentially generalized to on-device inference use cases.

LLM Inference on Mobile SoCs

Mobile Processor Characterization & Optimization

Compiler-based ML Optimization

Attention Acceleration

Quantization/Sparsity

Application-centric On-device AI Systems

Multi-DNN / Heterogeneous Runtime Scheduling

On-device Training, Model Adaptation

Profilers