这个库纯粹是因为我比较懒,所以搞了个集合,方便自己用而已,后来点star的越来越多,有点意外。我们组 主页 正在做相关的研究工作,在我的账号fangvv下会不断地发布相关的论文代码,希望大家多多交流,互相学习。
Please note that I just want to collect these links from the original sites for research purposes. Welcome to join us to discuss interesting ideas on efficient DNN training/inference.
https://zhuanlan.zhihu.com/p/58705979
http://blog.csdn.net/wspba/article/details/75671573
https://www.ctolib.com/ZhishengWang-Embedded-Neural-Network.html
https://blog.csdn.net/touch_dream/article/details/78441332
https://zhuanlan.zhihu.com/p/28439056
https://blog.csdn.net/QcloudCommunity/article/details/77719498
https://www.cnblogs.com/zhonghuasong/p/7493475.html
https://blog.csdn.net/jackytintin/article/details/53445280
https://zhuanlan.zhihu.com/p/27747628
https://blog.csdn.net/shuzfan/article/category/6271575
https://blog.csdn.net/cookie_234
https://www.jianshu.com/u/f5c90c3856bb
https://github.com/sun254/awesome-model-compression-and-acceleration
awesome-model-compression-and-acceleration
Paper
Overview
- Model compression as constrained optimization, with application to neural nets. Part I: general framework
- Model compression as constrained optimization, with application to neural nets. Part II: quantization -A Survey of Model Compression and Acceleration for Deep Neural Networks
Structure
- Dynamic Capacity Networks
- ResNeXt: Aggregated Residual Transformations for Deep Neural Networks
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- Xception: Deep Learning with Depthwise Separable Convolutions
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Residual Attention Network for Image Classification
- SEP-Nets: Small and Effective Pattern Networks
- Deep Networks with Stochastic Depth
- Learning Infinite Layer Networks Without the Kernel Trick
- Coordinating Filters for Faster Deep Neural Networks
- ResBinNet: Residual Binary Neural Network
- Squeezedet: Unified, small, low power fully convolutional neural networks
- Efficient Sparse-Winograd Convolutional Neural Networks
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
- Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
Distillation
- Dark knowledge
- FitNets: Hints for Thin Deep Nets
- [Net2net: Accelerating learning via knowledge transfer]()
- Distilling the Knowledge in a Neural Network
- MobileID: Face Model Compression by Distilling Knowledge from Neurons
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
- Sequence-Level Knowledge Distillation
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Learning Efficient Object Detection Models with Knowledge Distillation
- Data-Free Knowledge Distillation For Deep Neural Networks
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
- Moonshine: Distilling with Cheap Convolutions
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
Binarization
- Local Binary Convolutional Neural Networks
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Quantization
- Quantize weights and activations in Recurrent Neural Networks
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- Quantized Convolutional Neural Networks for Mobile Devices
- Compressing Deep Convolutional Networks using Vector Quantization
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- Fixed-Point Performance Analysis of Recurrent Neural Networks
- Loss-aware Binarization of Deep Networks
- Towards the Limit of Network Quantization
- Deep Learning with Low Precision by Half-wave Gaussian Quantization
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- Trained Ternary Quantization
Pruning
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- Pruning Filters for Efficient ConvNets
- Pruning Convolutional Neural Networks for Resource Efficient Inference
- Soft Weight-Sharing for Neural Network Compression
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Learning both Weights and Connections for Efficient Neural Networks
- Dynamic Network Surgery for Efficient DNNs
- ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning
Low Rank Approximation
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks
- Accelerating Very Deep Convolutional Networks for Classification and Detection
- Convolutional neural networks with low-rank regularization
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Speeding up convolutional neural networks with low rank expansions
https://github.com/memoiry/Awesome-model-compression-and-acceleration
Awesome-model-compression-and-acceleration
Some papers I collected and deemed to be great to read, which is also what I'm about to read, raise a PR or issue if you have any suggestion regarding the list, Thank you.
Survey
- A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv '17]
- Recent Advances in Efficient Computation of Deep Convolutional Neural Networks [arXiv '18]
Model and structure
- MobilenetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation [arXiv '18, Google]
- NasNet: Learning Transferable Architectures for Scalable Image Recognition [arXiv '17, Google]
- DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices [AAAI'18, Samsung]
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [arXiv '17, Megvii]
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv '17, Google]
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions [arXiv '17]
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video[arxiv'17]
- Shift-based Primitives for Efficient Convolutional Neural Networks [WACV'18]
Quantization
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML'17]
- Compressing Deep Convolutional Networks using Vector Quantization [arXiv'14]
- Quantized Convolutional Neural Networks for Mobile Devices [CVPR '16]
- Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP'16]
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv'16]
- Loss-aware Binarization of Deep Networks [ICLR'17]
- Towards the Limit of Network Quantization [ICLR'17]
- Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR'17]
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv'17]
- Training and Inference with Integers in Deep Neural Networks [ICLR'18]
- Deep Learning with Limited Numerical Precision[ICML'2015]
Pruning
- Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]
- Pruning Filters for Efficient ConvNets [ICLR'17]
- Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
- Soft Weight-Sharing for Neural Network Compression [ICLR'17]
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
- Dynamic Network Surgery for Efficient DNNs [NIPS'16]
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]
- To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR'18]
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Learning Structured Sparsity in Deep Neural Networks
- Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
- Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
- Channel pruning for accelerating very deep neural networks [ICCV'17]
- Amc: Automl for model compression and acceleration on mobile devices [ECCV'18]
- RePr: Improved Training of Convolutional Filters [arXiv'18]
Binarized neural network
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
Low-rank Approximation
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
- Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
- Convolutional neural networks with low-rank regularization [arXiv'15]
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]
- High performance ultra-low-precision convolutions on mobile devices [NIPS'17]
- Speeding up convolutional neural networks with low rank expansions
Distilling
- Dark knowledge
- FitNets: Hints for Thin Deep Nets
- [Net2net: Accelerating learning via knowledge transfer]()
- Distilling the Knowledge in a Neural Network
- MobileID: Face Model Compression by Distilling Knowledge from Neurons
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
- Sequence-Level Knowledge Distillation
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Learning Efficient Object Detection Models with Knowledge Distillation
- Data-Free Knowledge Distillation For Deep Neural Networks
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
- Moonshine: Distilling with Cheap Convolutions
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
System
- DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications [MobiSys '17]=
- DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware [MobiSys '17]
- MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU [EMDL '17]
- DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices [WearSys '16]
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices [IPSN '16]
- EIE: Efficient Inference Engine on Compressed Deep Neural Network [ISCA '16]
- MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints [MobiSys '16]
- DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit [MobiCASE '16]
- Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables [SenSys ’16]
- An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices [IoT-App ’15]
- CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android [MM '16]
- fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS '17]
Some optimization techniques
- 消灭重复计算
- 展开循环
- 利用SIMD指令
- OpenMP
- 定点化
- 避免非连续内存读写
References
- 纵览轻量化卷积神经网络:SqueezeNet、MobileNet、ShuffleNet、Xception
- An Introduction to different Types of Convolutions in Deep Learning
- CNN中千奇百怪的卷积方式大汇总
https://github.com/chester256/Model-Compression-Papers
Model-Compression-Papers
Papers for neural network compression and acceleration. Partly based on link.
Survey
-
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks, [arxiv '18]
-
A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv '17]
Quantization
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML'17]
- Compressing Deep Convolutional Networks using Vector Quantization [arXiv'14]
- Quantized Convolutional Neural Networks for Mobile Devices [CVPR '16]
- Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP'16]
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv'16]
- Loss-aware Binarization of Deep Networks [ICLR'17]
- Towards the Limit of Network Quantization [ICLR'17]
- Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR'17]
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv'17]
- Training and Inference with Integers in Deep Neural Networks [ICLR'18]
- Deep Learning with Limited Numerical Precision[ICML'2015]
- Model compression via distillation and quantization [ICLR '18]
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy [ICLR '18]
- On the Universal Approximability of Quantized ReLU Neural Networks [arXiv '18]
- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [CVPR '18]
Pruning
- Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]
- Pruning Filters for Efficient ConvNets [ICLR'17]
- Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
- Soft Weight-Sharing for Neural Network Compression [ICLR'17]
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
- Dynamic Network Surgery for Efficient DNNs [NIPS'16]
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]
- To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR'18]
- Data-Driven Sparse Structure Selection for Deep Neural Networks [arXiv '17]
- Learning Structured Sparsity in Deep Neural Networks [NIPS '16]
- Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism [ISCA '17]
- Channel Pruning for Accelerating Very Deep Neural Networks [ICCV '17]
- Learning Efficient Convolutional Networks through Network Slimming [ICCV '17]
- NISP: Pruning Networks using Neuron Importance Score Propagation [CVPR '18]
- Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers [ICLR '18]
- MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks [arXiv '17]
- Efficient Sparse-Winograd Convolutional Neural Networks [ICLR '18]
- “Learning-Compression” Algorithms for Neural Net Pruning [CVPR '18]
Binarized Neural Network
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 [NIPS '16]
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks [ECCV '16]
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration [CVPR '17]
Low-rank Approximation
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
- Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
- Convolutional neural networks with low-rank regularization [arXiv'15]
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]
- High performance ultra-low-precision convolutions on mobile devices [NIPS'17]
- Speeding up convolutional neural networks with low rank expansions
- Coordinating Filters for Faster Deep Neural Networks [ICCV '17]
Knowledge Distillation
- Dark knowledge
- FitNets: Hints for Thin Deep Nets [ICLR '15]
- Net2net: Accelerating learning via knowledge transfer [ICLR '16]
- Distilling the Knowledge in a Neural Network [NIPS '15]
- MobileID: Face Model Compression by Distilling Knowledge from Neurons [AAAI '16]
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer [arXiv '17]
- Deep Model Compression: Distilling Knowledge from Noisy Teachers [arXiv '16]
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer [ICLR '17]
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer [arXiv '17]
- Learning Efficient Object Detection Models with Knowledge Distillation [NIPS '17]
- Data-Free Knowledge Distillation For Deep Neural Networks [NIPS '17]
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learnin [CVPR '17]
- Moonshine: Distilling with Cheap Convolutions [arXiv '17]
- Model compression via distillation and quantization [ICLR '18]
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy [ICLR '18]
Miscellaneous
- Beyond Filters: Compact Feature Map for Portable Deep Model [ICML '17]
- SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization [ICML '17]
https://github.com/ZhishengWang/Embedded-Neural-Network
Papers Reading List.
- This is a collection of papers aiming at reducing model sizes or the ASIC/FPGA accelerator for Machine Learning, especially deep neural network related applications. (Inspiled by Neural-Networks-on-Silicon)
- Tutorials:
- Our Contributions
- Network Compression
- Parameter Sharing
- Teacher-Student Mechanism (Distilling)
- Fixed-precision training and storage
- Sparsity regularizers & Pruning
- Tensor Decomposition
- Conditional (Adaptive) Computing
- Hardware Accelerator
- Benchmark and Platform Analysis
- Recurrent Neural Networks
- Conference Papers
- TODO
Network Compression
This field is changing rapidly, belowing entries may be somewhat antiquated.
Parameter Sharing
- structured matrices
- Structured Convolution Matrices for Energy-efficient Deep learning. (IBM Research–Almaden)
- Structured Transforms for Small-Footprint Deep Learning. (Google Inc)
- An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections.
- Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank.
- Hashing
- Functional Hashing for Compressing Neural Networks. (Baidu Inc)
- Compressing Neural Networks with the Hashing Trick. (Washington University + NVIDIA)
- Learning compact recurrent neural networks. (University of Southern California + Google)
Teacher-Student Mechanism (Distilling)
- Distilling the Knowledge in a Neural Network. (Google Inc)
- Sequence-Level Knowledge Distillation. (Harvard University)
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. (TuSimple)
Fixed-precision training and storage
- Binary/Ternary Neural Networks
- XNOR-Net, Ternary Weight Networks (TWNs), Binary-net and their variants.
- Deep neural networks are robust to weight binarization and other non-linear distortions. (IBM Research–Almaden)
- Recurrent Neural Networks With Limited Numerical Precision. (ETH Zurich + Montréal@Yoshua Bengio)
- Neural Networks with Few Multiplications. (Montréal@Yoshua Bengio)
- 1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs. (Tsinghua University + Microsoft)
- Towards the Limit of Network Quantization. (Samsung US R&D Center)
- Incremental Network Quantization_Towards Lossless CNNs with Low-precision Weights. (Intel Labs China)
- Loss-aware Binarization of Deep Networks. (Hong Kong University of Science and Technology)
- Trained Ternary Quantization. (Tsinghua University + Stanford University + NVIDIA)
Sparsity regularizers & Pruning
- Learning both Weights and Connections for Efficient Neural Networks. (SongHan, Stanford University)
- Deep Compression, EIE. (SongHan, Stanford University)
- Dynamic Network Surgery for Efficient DNNs. (Intel)
- Compression of Neural Machine Translation Models via Pruning. (Stanford University)
- Accelerating Deep Convolutional Networks using low-precision and sparsity. (Intel)
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning. (Intel)
- Exploring Sparsity in Recurrent Neural Networks. (Baidu Research)
- Pruning Convolutional Neural Networks for Resource Efficient Inference. (NVIDIA)
- Pruning Filters for Efficient ConvNets. (University of Maryland + NEC Labs America)
- Soft Weight-Sharing for Neural Network Compression. (University of Amsterdam, reddit discussion)
- Sparsely-Connected Neural Networks_Towards Efficient VLSI Implementation of Deep Neural Networks. (McGill University)
- Training Compressed Fully-Connected Networks with a Density-Diversity Penalty. (University of Washington)
- Bayesian Compression
- Bayesian Sparsification of Recurrent Neural Networks
- Bayesian Compression for Deep Learning
- Structured Bayesian Pruning via Log-Normal Multiplicative Noise
Tensor Decomposition
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. (Samsung, etc)
- Learning compact recurrent neural networks. (University of Southern California + Google)
- Tensorizing Neural Networks. (Skolkovo Institute of Science and Technology, etc)
- Ultimate tensorization_compressing convolutional and FC layers alike. (Moscow State University, etc)
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks. (@CVPR2015)
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. (New York University, etc.)
- Convolutional neural networks with low-rank regularization. (Princeton University, etc.)
- Learning with Tensors: Why Now and How? (Tensor-Learn Workshop @ NIPS'16)
Conditional (Adaptive) Computing
- Adaptive Computation Time for Recurrent Neural Networks. (Google DeepMind@Alex Graves)
- Variable Computation in Recurrent Neural Networks. (New York University + Facebook AI Research)
- Spatially Adaptive Computation Time for Residual Networks. (github link, Google, etc.)
- Hierarchical Multiscale Recurrent Neural Networks. (Montréal)
- Outrageously Large Neural Networks_The Sparsely-Gated Mixture-of-Experts Layer. (Google Brain, etc.)
- Adaptive Neural Networks for Fast Test-Time Prediction. (Boston University, etc)
- Dynamic Deep Neural Networks_Optimizing Accuracy-Efficiency Trade-offs by Selective Execution. (University of Michigan)
- Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. (@Yoshua Bengio)
- Multi-Scale Dense Convolutional Networks for Efficient Prediction. (Cornell University, etc)
Hardware Accelerator
Benchmark and Platform Analysis
- Fathom: Reference Workloads for Modern Deep Learning Methods. (Harvard University)
- DeepBench: Open-Source Tool for benchmarking DL operations. (svail.github.io-Baidu)
- BENCHIP: Benchmarking Intelligence Processors.
- DAWNBench: An End-to-End Deep Learning Benchmark and Competition. (Stanford)
- MLPerf: A broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ML cloud platforms.
Recurrent Neural Networks
- FPGA-based Low-power Speech Recognition with Recurrent Neural Networks. (Seoul National University)
- Accelerating Recurrent Neural Networks in Analytics Servers: Comparison of FPGA, CPU, GPU, and ASIC. (Intel)
- ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA. (FPGA 2017, Best Paper Award)
- DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for GeneralPurpose Deep Neural Networks. (KAIST, ISSCC 2017)
- Hardware Architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition. (University of Kaiserslautern, etc)
- Efficient Hardware Mapping of Long Short-Term Memory Neural Networks for Automatic Speech Recognition. (Master Thesis@Georgios N. Evangelopoulos)
- Hardware Accelerators for Recurrent Neural Networks on FPGA. (Purdue University, ISCAS 2017)
- Accelerating Recurrent Neural Networks: A Memory Efficient Approach. (Nanjing University)
- A Fast and Power Efficient Architecture to Parallelize LSTM based RNN for Cognitive Intelligence Applications.
- An Energy-Efficient Reconfigurable Architecture for RNNs Using Dynamically Adaptive Approximate Computing.
- A Systolically Scalable Accelerator for Near-Sensor Recurrent Neural Network Inference.
- A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications
- E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks
- C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs (FPGA 2018, Peking Univ, Syracuse Univ, CUNY)
- DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator. (FPGA 2018, ETHZ, BenevolentAI)
- Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs (MACRO 2018)
- E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs (HPCA 2019)
Convolutional Neural Networks
- Please refer to Neural-Networks-on-Silicon
Conference Papers
NIPS 2016
- Dynamic Network Surgery for Efficient DNNs. (Intel Labs China)
- Memory-Efficient Backpropagation Through Time. (Google DeepMind)
- PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions. (Moscow State University, etc.)
- Learning Structured Sparsity in Deep Neural Networks. (University of Pittsburgh)
- LightRNN: Memory and Computation-Efficient Recurrent Neural Networks. (Nanjing University + Microsoft Research)
ICASSP 2017
- lognet: energy-efficient neural networks using logarithmic computation. (Stanford University)
- extended low rank plus diagonal adaptation for deep and recurrent neural networks. (Microsoft)
- fixed-point optimization of deep neural networks with adaptive step size retraining. (Seoul National University)
- implementation of efficient, low power deep neural networks on next-generation intel client platforms (Demos). (Intel)
- knowledge distillation for small-footprint highway networks. (TTI-Chicago, etc)
- automatic node selection for deep neural networks using group lasso regularization. (Doshisha University, etc)
- accelerating deep convolutional networks using low-precision and sparsity. (Intel Labs)
CVPR 2017
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. (MIT)
- Network Sketching: Exploiting Binary Structure in Deep CNNs. (Intel Labs China + Tsinghua University)
- Spatially Adaptive Computation Time for Residual Networks. (Google, etc)
- A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation. (University of Pittsburgh, etc)
ICML 2017
- Deep Tensor Convolution on Multicores. (MIT)
- Beyond Filters: Compact Feature Map for Portable Deep Model. (Peking University + University of Sydney)
- Combined Group and Exclusive Sparsity for Deep Neural Networks. (UNIST)
- Delta Networks for Optimized Recurrent Network Computation. (Institute of Neuroinformatics, etc)
- MEC: Memory-efficient Convolution for Deep Neural Network. (IBM Research)
- Deciding How to Decide: Dynamic Routing in Artificial Neural Networks. (California Institute of Technology)
- Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning. (ETH Zurich, etc)
- Analytical Guarantees on Numerical Precision of Deep Neural Networks. (University of Illinois at Urbana-Champaign)
- Variational Dropout Sparsifies Deep Neural Networks. (Skoltech, etc)
- Adaptive Neural Networks for Fast Test-Time Prediction. (Boston University, etc)
- Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank. (The City University of New York, etc)
ICCV 2017
- Channel Pruning for Accelerating Very Deep Neural Networks. (Xi’an Jiaotong University + Megvii Inc.)
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. (Nanjing University, etc)
- Learning Efficient Convolutional Networks through Network Slimming. (Intel Labs China, etc)
- Performance Guaranteed Network Acceleration via High-Order Residual Quantization. (Shanghai Jiao Tong University + Peking University)
- Coordinating Filters for Faster Deep Neural Networks. (University of Pittsburgh + Duke University, etc, github link)
NIPS 2017
- Towards Accurate Binary Convolutional Neural Network. (DJI)
- Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations. (ETH Zurich)
- TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. (Duke University, etc, github link)
- Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. (Intel)
- Bayesian Compression for Deep Learning. (University of Amsterdam, etc)
- Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon. (Nanyang Technological Univ)
- Training Quantized Nets: A Deeper Understanding. (University of Maryland)
- Structured Bayesian Pruning via Log-Normal Multiplicative Noise. (Yandex, etc)
- Runtime Neural Pruning. (Tsinghua University)
- The Reversible Residual Network: Backpropagation Without Storing Activations. (University of Toronto, gihub link)
- Compression-aware Training of Deep Networks. (Toyota Research Institute + EPFL)
ICLR 2018
- Oral
- Training and Inference with Integers in Deep Neural Networks. (Tsinghua University)
- Poster
- Learning Sparse NNs Through L0 Regularization
- Learning Intrinsic Sparse Structures within Long Short-Term Memory
- Variantional Network Quantization
- Alternating Multi-BIT Quantization for Recurrent Neural Networks
- Mixed Precision Training
- Multi-Scale Dense Networks for Resource Efficient Image Classification
- efficient sparse-winograd CNNs
- Compressing Wrod Embedding via Deep Compositional Code Learning
- Mixed Precision Training of Convolutional Neural Networks using Integer Operations
- Adaptive Quantization of Neural Networks
- Espresso_Efficient Forward Propagation for Binary Deep Neural Networks
- WRPN_Wide Reduced-Precision Networks
- Deep Rewiring_Training very sparse deep networks
- Loss-aware Weight Quantization of Deep Network
- Learning to share_simultaneous parameter tying and sparsification in deep learning
- Deep Gradient Compression_Reducing the Communication Bandwidth for Distributed Training
- Large scale distributed neural network training through online distillation
- Learning Discrete Weights Using the Local Reparameterization Trick
- Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
- Training wide residual networks for deployment using a single bit for each weight
- The High-Dimensional Geometry of Binary Neural Networks
- workshop
- To Prune or Not to Prune_Exploring the Efficacy of Pruning for Model Compression
CVPR 2018
- To Prune or Not to Prune_Exploring the Efficacy of Pruning for Model Compression
- Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
- BlockDrop: Dynamic Inference Paths in Residual Networks
- SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks
- Two-Step Quantization for Low-Bit Neural Networks
- Towards Effective Low-Bitwidth Convolutional Neural Networks
- Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks
- CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization
- “Learning-Compression” Algorithms for Neural Net Pruning
- Wide Compression: Tensor Ring Nets
- NestedNet: Learning Nested Sparse Structures in Deep Neural Networks
- Interleaved Structured Sparse Convolutional Neural Networks
- NISP: Pruning Networks Using Neuron Importance Score Propagation
- Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition
- HydraNets: Specialized Dynamic Architectures for Efficient Inference
- Learning Time/Memory-Efficient Deep Architectures With Budgeted Super Networks
ECCV 2018
- ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
- A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
- Learning Compression from Limited Unlabeled Data
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices
- Training Binary Weight Networks via Semi-Binary Decomposition
- Clustering Convolutional Kernels to Compress Deep Neural Networks
- Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Coreset-Based Neural Network Compression
- Convolutional Networks with Adaptive Inference Graphs
- Value-aware Quantization for Training and Inference of Neural Networks
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
- Deep Expander Networks: Efficient Deep Networks from Graph Theory
- Extreme Network Compression via Filter Group Approximation
- Constraint-Aware Deep Neural Network Compression
ICML 2018
- Compressing Neural Networks using the Variational Information Bottleneck
- DCFNet_Deep Neural Network with Decomposed Convolutional Filters
- Deep k-Means Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
- Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
- High Performance Zero-Memory Overhead Direct Convolutions
- Kronecker Recurrent Units
- Learning Compact Neural Networks with Regularization
- StrassenNets_Deep Learning with a Multiplication Budge
- Weightless_Lossy weight encoding for deep neural network compression
- WSNet_Compact and Efficient Networks Through Weight Sampling
NIPS 2018
- workshops
- 7761-scalable-methods-for-8-bit-training-of-neural-networks
- 7382-frequency-domain-dynamic-pruning-for-convolutional-neural-networks
- 7697-sparsified-sgd-with-memory
- 7994-training-deep-neural-networks-with-8-bit-floating-point-numbers
- 7358-kdgan-knowledge-distillation-with-generative-adversarial-networks
- 7980-knowledge-distillation-by-on-the-fly-native-ensemble
- 8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices
- 7553-moonshine-distilling-with-cheap-convolutions
- 7341-hitnet-hybrid-ternary-recurrent-neural-network
- 8116-fastgrnn-a-fast-accurate-stable-and-tiny-kilobyte-sized-gated-recurrent-neural-network
- 7327-training-dnns-with-hybrid-block-floating-point
- 8117-reversible-recurrent-neural-networks
- 485-norm-matters-efficient-and-accurate-normalization-schemes-in-deep-networks
- 8218-synaptic-strength-for-convolutional-neural-network
- 7666-tetris-tile-matching-the-tremendous-irregular-sparsity
- 7644-learning-sparse-neural-networks-via-sensitivity-driven-regularization
- 7466-pelee-a-real-time-object-detection-system-on-mobile-devices
- 7433-learning-versatile-filters-for-efficient-convolutional-neural-networks
- 7841-multi-task-zipping-via-layer-wise-neuron-sharing
- 7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication
- 7759-gradiveq-vector-quantization-for-bandwidth-efficient-gradient-aggregation-in-distributed-cnn-training
- 8191-atomo-communication-efficient-learning-via-atomic-sparsification
- 7405-gradient-sparsification-for-communication-efficient-distributed-optimization
ICLR 2019
- Poster:
- SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY
- Rethinking the Value of Network Pruning
- Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach
- Dynamic Channel Pruning: Feature Boosting and Suppression
- Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
- Slimmable Neural Networks
- RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks
- Dynamic Sparse Graph for Efficient Deep Learning
- Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
- Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
- Learning Recurrent Binary/Ternary Weights
- Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network
- Relaxed Quantization for Discretized Neural Networks
- Integer Networks for Data Compression with Latent-Variable Models
- Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
- A Systematic Study of Binary Neural Networks' Optimisation
- Analysis of Quantized Models
- Oral:
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
CVPR 2019
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
- Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
- T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor
- Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
- others to be added
https://github.com/cedrickchee/awesome-ml-model-compression
Awesome ML Model Compression 
An awesome style list that curates the best machine learning model compression and acceleration research papers, articles, tutorials, libraries, tools and more. PRs are welcome!
Contents
Papers
General
- A Survey of Model Compression and Acceleration for Deep Neural Networks
- Model compression as constrained optimization, with application to neural nets. Part I: general framework
- Model compression as constrained optimization, with application to neural nets. Part II: quantization
Architecture
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- MobileNetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
- Xception: Deep Learning with Depthwise Separable Convolutions
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
- AddressNet: Shift-based Primitives for Efficient Convolutional Neural Networks
- ResNeXt: Aggregated Residual Transformations for Deep Neural Networks
- ResBinNet: Residual Binary Neural Network
- Residual Attention Network for Image Classification
- Squeezedet: Unified, small, low power fully convolutional neural networks
- SEP-Nets: Small and Effective Pattern Networks
- Dynamic Capacity Networks
- Learning Infinite Layer Networks Without the Kernel Trick
- Efficient Sparse-Winograd Convolutional Neural Networks
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks
- Coordinating Filters for Faster Deep Neural Networks
- Deep Networks with Stochastic Depth
Quantization
- Quantized Convolutional Neural Networks for Mobile Devices
- Towards the Limit of Network Quantization
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- Compressing Deep Convolutional Networks using Vector Quantization
- Trained Ternary Quantization
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- Deep Learning with Low Precision by Half-wave Gaussian Quantization
- Loss-aware Binarization of Deep Networks
- Quantize weights and activations in Recurrent Neural Networks
- Fixed-Point Performance Analysis of Recurrent Neural Networks
Binarization
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- Local Binary Convolutional Neural Networks
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Pruning
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Pruning Convolutional Neural Networks for Resource Efficient Inference
- Pruning Filters for Efficient ConvNets
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
- Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- Learning both Weights and Connections for Efficient Neural Networks
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Soft Weight-Sharing for Neural Network Compression
- Dynamic Network Surgery for Efficient DNNs
- Channel pruning for accelerating very deep neural networks
- AMC: AutoML for model compression and acceleration on mobile devices
- ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
Distillation
- Distilling the Knowledge in a Neural Network
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Learning Efficient Object Detection Models with Knowledge Distillation
- Data-Free Knowledge Distillation For Deep Neural Networks
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
- Moonshine: Distilling with Cheap Convolutions
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Sequence-Level Knowledge Distillation
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
- Dark knowledge
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- FitNets: Hints for Thin Deep Nets
- MobileID: Face Model Compression by Distilling Knowledge from Neurons
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
Low Rank Approximation
- Speeding up convolutional neural networks with low rank expansions
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- Convolutional neural networks with low-rank regularization
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Accelerating Very Deep Convolutional Networks for Classification and Detection
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks
Articles
Content published on the Web.
Howtos
Assorted
- Why the Future of Machine Learning is Tiny
- Deep Learning Model Compression for Image Analysis: Methods and Architectures
Reference
Blogs
Tools
Libraries
- TensorFlow Model Optimization Toolkit. Accompanied blog post, TensorFlow Model Optimization Toolkit — Pruning API
Frameworks
Videos
Talks
Training & tutorials
License
To the extent possible under law, Cedric Chee has waived all copyright and related or neighboring rights to this work.
https://github.com/jnjaby/Model-Compression-Acceleration
Model-Compression-Acceleration
Papers
Quantization
- Product Quantization for Nearest Neighbor Search,TPAMI,2011 [paper]
- Compressing Deep Convolutional Networks using Vector Quantization,ICLR,2015 [paper]
- Deep Learning with Limited Numerical Precision, ICML, 2015 [paper]
- Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks, ArXiv, 2016 [paper]
- Fixed Point Quantization of Deep Convolutional Networks, ICML, 2016 [paper]
- Quantized Convolutional Neural Networks for Mobile Devices, CVPR, 2016 [paper]
- Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights, ICLR, 2017 [paper]
- BinaryConnect: Training Deep Neural Networks with binary weights during propagations, NIPS, 2015 [paper]
- BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1, ArXiV, 2016 [paper]
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks, ECCV, 2016 [paper]
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations, ArXiv, 2016 [paper]
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients, ArXiv, 2016 [paper]
Pruning
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, ICLR, 2016 [paper]
- Optimal Brain Damage, NIPS, 1990 [paper]
- Learning both Weights and Connections for Efficient Neural Network, NIPS, 2015 [paper]
- Pruning Filters for Efficient ConvNets, ICLR, 2017 [paper]
- Sparsifying Neural Network Connections for Face Recognition, CVPR, 2016 [paper]
- Learning Structured Sparsity in Deep Neural Networks, NIPS, 2016 [paper]
- Pruning Convolutional Neural Networks for Resource Efficient Inference, ICLR, 2017 [paper]
Knowledge Distallation
- Distilling the Knowledge in a Neural Network, ArXiv, 2015 [paper]
- FitNets: Hints for Thin Deep Nets, ICLR, 2015 [paper]
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR, 2017 [paper]
- Face Model Compression by Distilling Knowledge from Neurons, AAAI, 2016 [paper]
- In Teacher We Trust: Learning Compressed Models for Pedestrian Detection, ArXiv, 2016 [paper]
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer, ArXiv, 2017 [paper]
Network Architecture
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5MB model size, ArXiv, 2016 [paper]
- Convolutional Neural Networks at Constrained Time Cost, CVPR, 2015 [paper]
- Flattened Convolutional Neural Networks for Feedforward Acceleration, ArXiv, 2014 [paper]
- Going deeper with convolutions, CVPR, 2015 [paper]
- Rethinking the Inception Architecture for Computer Vision, CVPR, 2016 [paper]
- Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial "Bottleneck" Structure, ArXiv, 2016 [paper]
- Xception: Deep Learning with Depthwise Separable Convolutions, ArXiv, 2017 [paper]
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, ArXiv, 2017 [paper]
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices, ArXiv, 2017 [paper]
Matrix Factorization(Low-rank Approximation)
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation, NIPS,2014 [paper]
- Speeding up Convolutional Neural Networks with Low Rank Expansions, BMVC, 2014 [paper]
- Deep Fried Convnets, ICCV, 2015 [paper]
- Accelerating Very Deep Convolutional Networks for Classification and Detection, TPAMI, 2016 [paper]
- Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition, ICLR, 2015 [paper]
https://github.com/mapleam/model-compression-and-acceleration-4-DNN (进去看)
