Home
Softono
awesome-ai-hardware

awesome-ai-hardware

Open source
13
Stars
4
Forks
0
Issues
1
Watchers
1 month
Last Commit

About awesome-ai-hardware

AI accelerators, edge inference devices, compilers, runtimes, benchmarks, and research for building and evaluating machine-learning systems.

Platforms

Web Self-hosted

Links

Awesome AI Hardware Awesome

AI accelerators, edge inference devices, compilers, runtimes, benchmarks, and research for building and evaluating machine-learning systems.

Contents

Hardware Platforms

  • NVIDIA CUDA - Parallel programming platform for NVIDIA GPUs and accelerators.
  • AMD ROCm - Open GPU compute stack for AMD accelerators.
  • Intel oneAPI - Cross-architecture programming model for CPUs, GPUs, FPGAs, and accelerators.
  • Google TPU - Tensor Processing Units for training and serving large machine-learning workloads.
  • Microsoft Azure Maia - Microsoft first-party inference accelerators for Azure, serving Copilot and the in-house MAI model family.
  • Cerebras WSE - Wafer-scale accelerator architecture for dense neural network compute.
  • Groq LPU - Inference processor architecture designed around deterministic token generation.
  • Tenstorrent - RISC-V based AI processor company with open software tools and developer boards.
  • SambaNova - Reconfigurable dataflow systems for enterprise AI training and inference.
  • Etched Sohu - Transformer-focused inference ASIC for high-throughput language model serving.

Edge and Embedded Hardware

  • NVIDIA Jetson - Edge AI modules for robotics, vision systems, industrial automation, and local inference.
  • Qualcomm AI Engine Direct SDK - Low-level access to Qualcomm Hexagon, CPU, and GPU inference paths.
  • Hailo-8 - M.2 and mini-PCIe accelerator family for low-power computer vision inference.
  • Google Coral - Edge TPU modules and boards for quantized TensorFlow Lite workloads.
  • Luxonis OAK-D - Depth camera with on-device neural inference through the DepthAI stack.
  • Kneron KL720 - Low-power neural processing unit for USB modules and embedded vision products.
  • Raspberry Pi AI Kit - Raspberry Pi 5 M.2 accelerator kit built around the Hailo inference processor.
  • AMD Versal AI Edge - Adaptive SoC family combining programmable logic, CPU cores, and AI Engines.
  • STMicroelectronics STM32N6 - Microcontroller series with an integrated neural acceleration block for edge AI.
  • Espressif ESP32-P4 - Application processor for vision, display, and AI-enabled embedded products.
  • Arduino UNO Q - Hybrid edge AI board pairing a Qualcomm Dragonwing QRB2210 Linux processor with an STM32U585 real-time microcontroller in the UNO form factor.

Silicon and Product Families

  • Apple Core ML - Model deployment framework for Apple Neural Engine, GPU, and CPU execution.
  • Qualcomm Snapdragon X Series - Laptop-class Arm processors with integrated Hexagon NPU acceleration.
  • Intel Core Ultra - AI PC processor family with integrated Intel AI Boost NPU blocks.
  • AMD Ryzen AI - Consumer processor family with XDNA neural processing units.
  • MediaTek NeuroPilot - Mobile AI platform for Dimensity and related MediaTek SoCs.
  • Samsung Exynos - Mobile processor family with integrated neural processing units.
  • Arm Ethos-U - MicroNPU IP for Cortex-M and Cortex-A edge inference designs.
  • Synaptics Astra - Edge AI processor platform for vision, audio, and multimodal embedded systems.
  • SiMa.ai MLSoC - Machine-learning SoC platform aimed at industrial edge deployment.
  • Axelera Metis - Edge AI platform built around the Metis accelerator architecture.

Emerging AI Silicon

  • Furiosa AI - Tensor contraction processor architecture for transformer inference, with a published microarchitecture and open compiler stack.
  • Rebellions - ATOM and REBEL AI accelerators targeting datacenter inference with a programmable software stack.
  • Lightmatter - Photonic compute and chip-to-chip interconnect platform for large-scale neural network workloads.
  • d-Matrix - Microsoft-backed company shipping the Corsair digital in-memory compute accelerator for low-latency generative AI inference.
  • MatX - Custom silicon for large language model training, designed around a bare-metal kernel programming model.
  • Lemurian Labs - Spatial processor architecture co-designed with a software-defined hardware compiler stack.

Compilers and Runtimes

  • XLA - Accelerated Linear Algebra compiler for TensorFlow, JAX, and other ML frameworks.
  • MLIR - Multi-Level Intermediate Representation for reusable compiler infrastructure.
  • Triton - Python-like language and compiler for writing custom GPU kernels.
  • Apache TVM - Open deep-learning compiler stack for CPUs, GPUs, and accelerators.
  • IREE - Intermediate Representation Execution Environment for deploying ML programs.
  • NVIDIA TensorRT - Inference optimizer and runtime for NVIDIA GPUs and Jetson modules.
  • ONNX Runtime - Cross-platform inference runtime with provider backends for multiple accelerators.
  • OpenVINO - Intel toolkit for optimizing and deploying inference on CPUs, GPUs, and NPUs.
  • Vitis AI - Compiler, runtime, and model zoo for AMD adaptive SoCs and Alveo cards.
  • HailoRT - Runtime and driver stack for Hailo AI accelerators.
  • LiteRT - Google runtime for on-device inference across mobile and embedded targets.
  • ExecuTorch - PyTorch runtime for deploying models to phones, wearables, and embedded devices.

Benchmarking and Profiling

  • MLPerf - Industry-standard benchmark suites for training, inference, storage, and edge ML systems.
  • AI-Benchmark - Deep-learning benchmark suite for mobile, desktop, and accelerator comparisons.
  • Geekbench AI - Cross-platform inference score browser with CPU, GPU, and NPU results.
  • LLMPerf - Benchmark harness for large language model serving throughput and latency.
  • NVIDIA Nsight Systems - System-wide performance analysis tool for CPU, GPU, and operating-system timelines.
  • NVIDIA Nsight Compute - Interactive CUDA kernel profiler for occupancy, memory, and instruction analysis.
  • PyTorch Profiler - Built-in profiler for PyTorch model execution and operator-level timing.
  • Perfetto - Production-grade tracing and profiling platform for systems performance analysis.

Open-Source Deployment Projects

  • Jetson Containers - Containerized CUDA, PyTorch, ROS, and ML stacks for NVIDIA Jetson development.
  • Jetson Inference - End-to-end classification, detection, pose, and segmentation examples for Jetson modules.
  • DeepStream Python Apps - Python bindings and examples for multi-camera DeepStream pipelines.
  • Isaac ROS Common - Docker and build infrastructure for NVIDIA Isaac ROS acceleration packages.
  • Hailo Model Zoo - Pretrained models, compilation scripts, and deployment flows for Hailo accelerators.
  • Hailo Raspberry Pi 5 Examples - Reference pipelines for Raspberry Pi 5 systems using Hailo AI modules.
  • Edge TPU - Userspace runtime, tests, and examples for Google Coral Edge TPU devices.
  • RKNN Model Zoo - Deployment examples and model zoo for Rockchip NPU boards.
  • Texas Instruments TIDL Tools - Model conversion and deployment tools for TI deep-learning accelerators.
  • OpenVINO Notebooks - Practical notebooks for model conversion, optimization, and inference on Intel hardware.
  • Qualcomm Linux Sample Apps - Detection and classification examples for Qualcomm Linux evaluation kits.
  • Qualcomm Intelligent Development Kit - Android samples using the Qualcomm AI Engine and QNN stack.
  • Ryzen AI Software - AMD examples and deployment tools for XDNA and XDNA 2 NPUs.
  • OpenVINO Toolkit - Open-source runtime, model optimizer, and samples for Intel inference deployment.
  • LeRobot - Robot learning library for imitation learning and reinforcement learning on local hardware.
  • MLCommons Tiny - TinyML benchmark suite for keyword spotting, image classification, and anomaly detection.
  • TensorFlow Lite Micro - Microcontroller inference runtime with optimized kernels for embedded targets.
  • Edge Impulse Standalone Inferencing - Portable C++ inference examples generated from Edge Impulse projects.
  • ESP-WHO - Face detection, recognition, and camera AI examples for ESP32 devices.
  • ESP-DL - Quantization and inference library for deploying neural networks on Espressif chips.
  • OpenMV - MicroPython machine-vision firmware and examples for camera microcontroller boards.
  • MaixPy - MicroPython AI framework for Sipeed K210, K230, and related RISC-V boards.
  • Openpilot - Open-source driver assistance stack running production workloads on automotive AI hardware.
  • Autoware - ROS 2 autonomous driving stack used for research and industrial vehicle development.
  • Apollo - Autonomous driving platform with perception, planning, simulation, and deployment examples.

Mobile and AI PC Inference

  • ncnn - Mobile neural network inference framework optimized for Arm CPUs and Vulkan GPUs.
  • MNN - Lightweight mobile inference engine used in Alibaba production applications.
  • Tencent TNN - Cross-platform inference framework for Android, iOS, and embedded deployments.
  • Xiaomi MACE - Mobile AI compute engine for heterogeneous CPU, GPU, DSP, and NPU execution.
  • llama.cpp - Portable C and C++ inference engine for quantized language models.
  • MLX - Array framework for Apple silicon with unified-memory model execution.
  • Core ML Tools - Conversion and compression tools for packaging models into Core ML format.
  • MediaPipe - Cross-platform graph framework for on-device vision, audio, and multimodal pipelines.
  • Transformers.js - Browser and server-side transformer inference through WebAssembly and WebGPU.
  • Candle - Minimal Rust ML framework for small binaries and local inference applications.
  • Ollama - Local language model runner for CPU and GPU backends.
  • LocalAI - Self-hosted OpenAI-compatible API server for local text, audio, and vision models.
  • Open WebUI - Local-first chat and retrieval interface commonly paired with Ollama.

Research Papers

Books and Courses

  • Dive into Deep Learning - Open textbook with runnable notebooks for modern deep-learning workloads.
  • Efficient Deep Learning - Practical techniques for efficient model training and inference.
  • GPU Puzzles - Puzzle-based introduction to GPU programming concepts.
  • CUDA MODE - Community lecture series on CUDA, GPU kernels, and accelerator programming.
  • Triton Tutorials - Hands-on examples for writing custom kernels with Triton.
  • TVM Tutorial - End-to-end introduction to model compilation with Apache TVM.
  • TPU Research Cloud - Program for researchers and learners to access Google TPU resources.

Community

  • CUDA MODE Discord - Community for GPU kernel programming, profiling, and performance engineering.
  • NVIDIA Developer Forums - Official forum for CUDA development and troubleshooting.
  • MLCommons - Engineering consortium for ML benchmarks, datasets, and best practices.
  • r/MachineLearning - Research community covering ML systems, models, and hardware trends.
  • SemiAnalysis - Technical analysis of AI chips, datacenter systems, and semiconductor supply chains.
  • tinyML Foundation - Community for ultra-low-power machine learning on embedded devices.

Contributing

Contributions welcome! Read the contribution guidelines first.