We provide a collection of popular neural network models and compare their floating point and quantized performance. Results demonstrate that quantized models can provide good accuracy, comparable to floating point models. Together with results, we also provide recipes for users to quantize floating-point models using the AI Model Efficiency ToolKit (AIMET).
Quantized inference is significantly faster than floating-point inference, and enables models to run in a power-efficient manner on mobile and edge devices. We use AIMET, a library that includes state-of-the-art techniques for quantization, to quantize various models available in TensorFlow and PyTorch frameworks. The list of models is provided in the sections below.
An original FP32 source model is quantized either using post-training quantization (PTQ) or Quantization-Aware-Training (QAT) technique available in AIMET. Example scripts for evaluation are provided for each model. When PTQ is needed, the evaluation script performs PTQ before evaluation. Wherever QAT is used, the fine-tuned model checkpoint is also provided.
Network | Model Source [1] | Floating Pt (FP32) Model [2] | Quantized Model [3] | Results [4] | Documentation |
---|---|---|---|---|---|
ResNet-50 (v1) | GitHub Repo | Pretrained Model | See Documentation | (ImageNet) Top-1 Accuracy FP32: 75.21% INT8: 74.96% |
ResNet50.md |
MobileNet-v2-1.4 | GitHub Repo | Pretrained Model | Quantized Model | (ImageNet) Top-1 Accuracy FP32: 75% INT8: 74.21% |
MobileNetV2.md |
EfficientNet Lite | GitHub Repo | Pretrained Model | Quantized Model | (ImageNet) Top-1 Accuracy FP32: 74.93% INT8: 74.99% |
EfficientNetLite.md |
SSD MobileNet-v2 | GitHub Repo | Pretrained Model | See Example | (COCO) Mean Avg. Precision (mAP) FP32: 0.2469 INT8: 0.2456 |
SSDMobileNetV2.md |
RetinaNet | GitHub Repo | Pretrained Model | See Example | (COCO) mAP FP32: 0.35 INT8: 0.349 Detailed Results |
RetinaNet.md |
Pose Estimation | Based on Ref. | Based on Ref. | Quantized Model | (COCO) mAP FP32: 0.383 INT8: 0.379, Mean Avg.Recall (mAR) FP32: 0.452 INT8: 0.446 |
PoseEstimation.md |
SRGAN | GitHub Repo | Pretrained Model | See Example | (BSD100) PSNR/SSIM FP32: 25.45/0.668 INT8: 24.78/0.628 INT8W/INT16Act.: 25.41/0.666 Detailed Results |
SRGAN.md |
[1] Original FP32 model source
[2] FP32 model checkpoint
[3] Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit activations (INT8W/INT16Act.) are used to further improve performance of post-training quantization.
[4] Results comparing float and quantized performance
[5] Script for quantized evaluation using the model referenced in “Quantized Model” column
(COCO dataset)
Average Precision/Recall | @[ IoU | area | maxDets] | FP32 | INT8 |
---|---|---|---|
Average Precision | @[ 0.50:0.95 | all | 100 ] | 0.350 | 0.349 |
Average Precision | @[ 0.50 | all | 100 ] | 0.537 | 0.536 |
Average Precision | @[ 0.75 | all | 100 ] | 0.374 | 0.372 |
Average Precision | @[ 0.50:0.95 | small | 100 ] | 0.191 | 0.187 |
Average Precision | @[ 0.50:0.95 | medium | 100 ] | 0.383 | 0.381 |
Average Precision | @[ 0.50:0.95 | large | 100 ] | 0.472 | 0.472 |
Average Recall | @[ 0.50:0.95 | all | 1 ] | 0.306 | 0.305 |
Average Recall | @[0.50:0.95 | all | 10 ] | 0.491 | 0.490 |
Average Recall | @[ 0.50:0.95 | all |100 ] | 0.533 | 0.532 |
Average Recall | @[ 0.50:0.95 | small | 100 ] | 0.345 | 0.341 |
Average Recall | @[ 0.50:0.95 | medium | 100 ] | 0.577 | 0.577 |
Average Recall | @[ 0.50:0.95 | large | 100 ] | 0.681 | 0.679 |
Model | Dataset | PSNR | SSIM |
---|---|---|---|
FP32 | Set5/Set14/BSD100 | 29.17/26.17/25.45 | 0.853/0.719/0.668 |
INT8/ACT8 | Set5/Set14/BSD100 | 28.31/25.55/24.78 | 0.821/0.684/0.628 |
INT8/ACT16 | Set5/Set14/BSD100 | 29.12/26.15/25.41 | 0.851/0.719/0.666 |
Network | Model Source [1] | Floating Pt (FP32) Model [2] | Quantized Model [3] | Results [4] | Documentation |
---|---|---|---|---|---|
MobileNetV2 | GitHub Repo | Pretrained Model | Quantized Model | (ImageNet) Top-1 Accuracy FP32: 71.67% INT8: 71.14% |
MobileNetV2.md |
EfficientNet-lite0 | GitHub Repo | Pretrained Model | Quantized Model | (ImageNet) Top-1 Accuracy FP32: 75.42% INT8: 74.44% |
EfficientNet-lite0.md |
DeepLabV3+ | GitHub Repo | Pretrained Model | Quantized Model | (PascalVOC) mIOU FP32: 72.62% INT8: 72.22% |
DeepLabV3.md |
MobileNetV2-SSD-Lite | GitHub Repo | Pretrained Model | Quantized Model | (PascalVOC) mAP FP32: 68.7% INT8: 68.6% |
MobileNetV2-SSD-lite.md |
Pose Estimation | Based on Ref. | Based on Ref. | Quantized Model | (COCO) mAP FP32: 0.364 INT8: 0.359 mAR FP32: 0.436 INT8: 0.432 |
PoseEstimation.md |
SRGAN | GitHub Repo | Pretrained Model (older version from here) | See Example | (BSD100) PSNR/SSIM FP32: 25.51/0.653 INT8: 25.5/0.648 Detailed Results |
SRGAN.md |
DeepSpeech2 | GitHub Repo | Pretrained Model | See Example | (Librispeech Test Clean) WER FP32 9.92% INT8: 10.22% |
DeepSpeech2.md |
[1] Original FP32 model source
[2] FP32 model checkpoint
[3] Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit weights are used to further improve performance of post-training quantization.
[4] Results comparing float and quantized performance
[5] Script for quantized evaluation using the model referenced in “Quantized Model” column
Model | Dataset | PSNR | SSIM |
---|---|---|---|
FP32 | Set5/Set14/BSD100 | 29.93/26.58/25.51 | 0.851/0.709/0.653 |
INT8 | Set5/Set14/BSD100 | 29.86/26.59/25.55 | 0.845/0.705/0.648 |
Before you can run the example script for a specific model, you need to install the AI Model Efficiency ToolKit (AIMET) software. Please see this Getting Started page for an overview. Then install AIMET and its dependencies using these Installation instructions.
NOTE: To obtain the exact version of AIMET software that was used to test this model zoo, please install release 1.13.0 when following the above instructions.
Download the necessary datasets and code required to run the example for the model of interest. The examples run quantized evaluation and if necessary apply AIMET techniques to improve quantized model performance. They generate the final accuracy results noted in the table above. Refer to the Docs for TensorFlow or PyTorch folder to access the documentation and procedures for a specific model.
AIMET Model Zoo is a project maintained by Qualcomm Innovation Center, Inc.
Please see the LICENSE file for details.