Summary: This a is non-exhaustive list of references for this component.
Table of Contents
2013
2015
2016
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- EIE: Efficient Inference Engine on Compressed Deep Neural Network
- Dynamic Network Surgery for Efficient DNNs
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Learning Structured Sparsity in Deep Neural Networks
2017
- Soft Weight-Sharing for Neural Network Compression
- Variational Dropout Sparsifies Deep Neural Networks
- Structured Bayesian Pruning via Log-Normal Multiplicative Noise
- Bayesian Compression for Deep Learning
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- To prune, or not to prune: exploring the efficacy of pruning for model compression
- A Survey of Model Compression and Acceleration for Deep Neural Networks
2018
- Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights
- Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
- Bayesian Compression for Natural Language Processing
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- Rethinking the Value of Network Pruning
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices
2019
- Stabilizing the Lottery Ticket Hypothesis
- Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
- Weight Agnostic Neural Networks
- The State of Sparsity in Deep Neural Networks
2020
- A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions
- An Overview of Neural Network Compression
2021
- Pruning and Quantization for Deep Neural Network Acceleration: A Survey
- Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
See NAS.
Note: Representation Learning Models (RLMs) can be either pretrained or not, but training (or reuse) takes place during the stage of Preprocessing. RLMs can be in the NLP, Computer Vision or other domain.
See also Awesome Efficient PLM Papers.
2019
2021
- NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search
- Towards Efficient Post-training Quantization of Pre-trained Language Models
- Compression of Generative Pre-trained Language Models via Quantization
- Synergistic Self-supervised and Quantization Learning
- Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
2022