Skip to content

Latest commit

 

History

History
1706 lines (1686 loc) · 48.1 KB

validated_model_list.md

File metadata and controls

1706 lines (1686 loc) · 48.1 KB

Validated Models

Validated MLPerf Models

Model Framework Support Example
ResNet50 V1.5 TensorFlow Yes Link
PyTorch Yes Link
DLRM PyTorch Yes Link
BERT large TensorFlow Yes Link
PyTorch Yes Link
SSD ResNet34 TensorFlow Yes Link
PyTorch Yes Link
RNN-T PyTorch Yes Link
3D-UNet TensorFlow WIP
PyTorch Yes Link

Validated Quantization Examples

Performance results test on ​​06/07/2022 with Intel Xeon Platinum 8380 Scalable processor, using 1 socket, 4 cores/instance, 10 instances and batch size 1.

Performance varies by use, configuration and other factors. See platform configuration for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

TensorFlow models with Intel TensorFlow 2.9.1

Model Accuracy Performance
throughput (samples/sec)
Example
INT8 FP32 Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
BERT large SQuAD 92.39 92.99 -0.64% 25.32 12.53 2.02x pb
DenseNet121 73.57% 72.89% 0.93% 370.52 329.74 1.12x pb
DenseNet161 76.24% 76.29% -0.07% 219.46 180.75 1.21x pb
DenseNet169 74.40% 74.65% -0.33% 301.33 259.88 1.16x pb
Faster R-CNN Inception ResNet V2 37.98% 38.33% -0.91% 3.96 2.34 1.69x pb
Faster R-CNN Inception ResNet V2 37.84% 38.33% -1.28% 3.98 2.31 1.72x SavedModel
Faster R-CNN ResNet101 30.28% 30.39% -0.36% 70 19.98 3.50x pb
Faster R-CNN ResNet101 30.37% 30.39% -0.07% 70.26 16.98 4.14x SavedModel
Inception ResNet V2 80.44% 80.40% 0.05% 281.79 137.91 2.04x pb
Inception V1 70.48% 69.74% 1.06% 2193.17 975.6 2.25x pb
Inception V2 74.36% 73.97% 0.53% 1835.35 838.82 2.19x pb
Inception V3 77.28% 76.75% 0.69% 973.42 376.3 2.59x pb
Inception V4 80.40% 80.27% 0.16% 575.9 200.55 2.87x pb
Mask R-CNN Inception V2 28.53% 28.73% -0.70% 132.51 50.3 2.63x pb
Mask R-CNN Inception V2 28.53% 28.73% -0.70% 132.89 50.97 2.61x ckpt
MobileNet V1 71.79% 70.96% 1.17% 3545.79 1191.94 2.97x pb
MobileNet V2 71.89% 71.76% 0.18% 2431.66 1420.11 1.71x pb
ResNet101 77.50% 76.45% 1.37% 877.91 355.49 2.47x pb
ResNet50 Fashion 77.80% 78.12% -0.41% 3977.5 2150.68 1.85x pb
ResNet50 V1.0 74.11% 74.27% -0.22% 1509.64 472.66 3.19x pb
ResNet50 V1.5 76.82% 76.46% 0.47% 1260.01 415.83 3.03x pb
ResNet V2 101 72.67% 71.87% 1.11% 436.52 318.3 1.37x pb
ResNet V2 152 73.03% 72.37% 0.91% 306.82 221.4 1.39x pb
ResNet V2 50 70.33% 69.64% 0.99% 749.85 574.19 1.31x pb
SSD MobileNet V1 22.97% 23.13% -0.69% 952.9 582.87 1.63x pb
SSD MobileNet V1 22.99% 23.13% -0.61% 954.92 413.24 2.31x ckpt
SSD ResNet34 21.69% 22.09% -1.81% 44.46 11.81 3.76x pb
SSD ResNet50 V1 37.86% 38.00% -0.37% 69.5 26.04 2.67x pb
SSD ResNet50 V1 37.81% 38.00% -0.50% 69.27 21.17 3.27x ckpt
VGG16 72.66% 70.89% 2.50% 660.46 177.85 3.71x pb
VGG19 72.72% 71.01% 2.41% 562.04 147.61 3.81x pb
Wide & Deep 77.62% 77.67% -0.07% 21332.47 19714.08 1.08x pb

PyTorch models with Torch 1.11.0+cpu in PTQ mode

Model Accuracy Performance
throughput (samples/sec)
Example
INT8 FP32 Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
ALBERT base MRPC 88.06% 88.50% -0.50% 34.28 29.54 1.16x eager
Barthez MRPC 82.99% 83.81% -0.97% 166.84 89.56 1.86x eager
BERT base COLA 58.80% 58.84% -0.07% 260 126.47 2.06x fx
BERT base MRPC 90.28% 90.69% -0.45% 251.79 126.46 1.99x fx
BERT base RTE 69.31% 69.68% -0.52% 252.14 126.45 1.99x fx
BERT base SST2 91.97% 91.86% 0.12% 258.98 126.42 2.05x fx
BERT base STSB 89.13% 89.75% -0.68% 249.57 126.39 1.97x fx
BERT large COLA 62.88% 62.57% 0.49% 88.75 36.7 2.42x fx
BERT large MRPC 89.93% 90.38% -0.49% 89.43 36.62 2.44x fx
BERT large QNLI 90.96% 91.82% -0.94% 91.27 37 2.47x fx
BERT large RTE 71.84% 72.56% -1.00% 77.62 36.01 2.16x fx
CamemBERT base MRPC 86.56% 86.82% -0.30% 241.39 124.77 1.93x eager
Deberta MRPC 91.17% 90.91% 0.28% 152.09 85.13 1.79x eager
DistilBERT base MRPC 88.66% 89.16% -0.56% 415.09 246.9 1.68x eager
DistilBERT base MRPC 88.74% 89.16% -0.47% 459.93 245.33 1.87x fx
FlauBERT MRPC 81.01% 80.19% 1.01% 644.05 457.32 1.41x eager
Inception V3 69.43% 69.52% -0.13% 454.3 213.7 2.13x eager
Longformer MRPC 90.59% 91.46% -0.95% 21.51 17.45 1.23x eager
Mask R-CNN 37.70% 37.80% -0.26% 17.61 5.76 3.06x eager
mBart WNLI 56.34% 56.34% 0.00% 65.05 31.26 2.08x eager
MobileNet V2 70.54% 71.84% -1.81% 740.97 535.54 1.38x eager
lvwerra/pegasus-samsum 42.21 42.67 -1.09% 3.89 1.14 3.41x eager
PeleeNet 71.64% 72.10% -0.64% 502.01 391.31 1.28x eager
ResNet18 69.57% 69.76% -0.27% 800.43 381.27 2.10x eager
ResNet18 69.57% 69.76% -0.28% 811.09 389.36 2.08x fx
ResNet50 75.98% 76.15% -0.21% 507.55 200.52 2.53x eager
ResNeXt101_32x8d 79.08% 79.31% -0.29% 203.54 73.85 2.76x eager
RNN-T 92.45 92.55 -0.10% 79.21 20.47 3.87x eager
Roberta Base MRPC 87.88% 88.18% -0.34% 250.21 124.92 2.00x eager
Se_ResNeXt50_32x4d 78.98% 79.08% -0.13% 358.63 173.03 2.07x eager
SqueezeBERT MRPC 87.77% 87.65% 0.14% 249.89 207.43 1.20x eager
Transfo-xl MRPC 81.97% 81.20% 0.94% 11.25 8.34 1.35x eager
YOLOv3 24.60% 24.54% 0.21% 108.09 40.02 2.70x eager

PyTorch models with Torch 1.11.0+cpu in QAT mode

Model Accuracy Performance
throughput (samples/sec)
Example
INT8 FP32 Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
ResNet18 69.74% 69.76% -0.03% 804.76 388.67 2.07x eager
ResNet18 69.73% 69.76% -0.04% 806.44 386.59 2.09x fx
BERT base MRPC QAT 89.60% 89.50% 0.11% 258.89 125.79 2.06x fx
ResNet50 76.04% 76.15% -0.14% 490.64 203.49 2.41x eager

PyTorch models with IPEX 1.11.0

Model Accuracy Performance
throughput (samples/sec)
Example
INT8 FP32 Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
bert-large-uncased-whole-word-masking-finetuned-squad 92.9 93.16 -0.28% 37.13 11.45 3.24x ipex
ResNeXt101_32x16d_wsl 84.02% 84.17% -0.18% 163.45 28.9 5.66x ipex
ResNet50 76.00% 76.15% -0.20% 707.86 202.02 3.51x ipex
SSD ResNet34 19.97% 20.00% -0.15% 30.84 8.55 3.61x ipex

ONNX Models with ONNX Runtime 1.11.0

Model Accuracy Performance
throughput (samples/sec)
Example
INT8 FP32 Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
AlexNet 54.74% 54.79% -0.09% 1518.97 676.74 2.24x qlinearops
AlexNet 54.74% 54.79% -0.09% 1411.3 652.6 2.16x qdq
BERT base MRPC DYNAMIC 85.54% 86.03% -0.57% 379.71 156.16 2.43x qlinearops
BERT base MRPC STATIC 85.29% 86.03% -0.86% 756.33 316.36 2.39x qlinearops
BERT SQuAD 80.44 80.67 -0.29% 115.58 64.71 1.79x qlinearops
BERT SQuAD 80.44 80.67 -0.29% 115.4 64.68 1.78x qdq
CaffeNet 56.19% 56.30% -0.20% 2786.79 802.7 3.47x qlinearops
CaffeNet 56.19% 56.30% -0.20% 2726.86 819.41 3.33x qdq
DenseNet 60.20% 60.96% -1.25% 404.83 340.63 1.19x qlinearops
DistilBERT base MRPC 84.56% 84.56% 0.00% 1630.41 596.68 2.73x qlinearops
EfficientNet 77.58% 77.70% -0.15% 1985.35 1097.33 1.81x qlinearops
Faster R-CNN 33.99% 34.37% -1.11% 10.02 4.32 2.32x qlinearops
Faster R-CNN 33.94% 34.37% -1.25% 10.41 4.28 2.43x qdq
FCN 64.66% 64.98% -0.49% 44.31 14.2 3.12x qlinearops
FCN 64.66% 64.98% -0.49% 18.11 14.19 1.28x qdq
GoogleNet 67.61% 67.79% -0.27% 1165.84 810.65 1.44x qlinearops
GoogleNet 67.61% 67.79% -0.27% 1165.73 809.98 1.44x qdq
Inception V1 67.23% 67.24% -0.01% 1205.89 838.71 1.44x qlinearops
Inception V1 67.23% 67.24% -0.01% 1204.93 843.16 1.43x qdq
Mask R-CNN 33.40% 33.72% -0.95% 8.56 3.76 2.27x qlinearops
Mask R-CNN 33.33% 33.72% -1.16% 8.4 3.81 2.20x qdq
Mobile bert MRPC 86.03% 86.27% -0.28% 790.11 686.35 1.15x qlinearops
MobileBERT SQuAD MLPerf 89.84 90.03 -0.20% 102.92 95.19 1.08x qlinearops
MobileNet V2 65.47% 66.89% -2.12% 5133.84 3394.73 1.51x qlinearops
MobileNet V2 65.47% 66.89% -2.12% 5066.31 3386.3 1.50x qdq
MobileNet V3 MLPerf 75.59% 75.74% -0.20% 4133.22 2132.92 1.94x qlinearops
MobileNetV2 (ONNX Model Zoo) 68.30% 69.48% -1.70% 5349.42 3373.29 1.59x qlinearops
ResNet50 V1.5 MLPerf 76.13% 76.46% -0.43% 1139.56 549.88 2.07x qlinearops
ResNet50 V1.5 72.28% 72.29% -0.01% 1165.35 556.02 2.10x qlinearops
ResNet50 V1.5 72.28% 72.29% -0.01% 1319.32 543.44 2.43x qdq
ResNet50 V1.5 (ONNX Model Zoo) 74.76% 74.99% -0.31% 1363.39 573.1 2.38x qlinearops
Roberta Base MRPC 90.44% 89.95% 0.54% 811.05 312.71 2.59x qlinearops
ShuffleNet V2 66.13% 66.36% -0.35% 4948.77 2847.66 1.74x qlinearops
SqueezeNet 56.55% 56.87% -0.56% 6296.79 4340.51 1.45x qlinearops
SqueezeNet 56.55% 56.87% -0.56% 6227.76 4383.8 1.42x qdq
SSD MobileNet V1 22.20% 23.10% -3.90% 917.64 709.48 1.29x qlinearops
SSD MobileNet V1 22.20% 23.10% -3.90% 840.99 655.99 1.28x qdq
SSD MobileNet V1 (ONNX Model Zoo) 22.88% 23.03% -0.65% 845.17 666.25 1.27x qlinearops
SSD MobileNet V1 (ONNX Model Zoo) 22.88% 23.03% -0.65% 790.06 624.2 1.27x qdq
SSD MobileNet V2 23.83% 24.68% -3.44% 703.55 506.6 1.39x qlinearops
SSD 18.68% 18.98% -1.58% 41.99 11.12 3.78x qdq
Tiny YOLOv3 12.08% 12.43% -2.82% 836.21 659.69 1.27x qlinearops
VGG16 66.60% 66.69% -0.13% 312.48 128.98 2.42x qlinearops
VGG16 (ONNX Model Zoo) 72.28% 72.40% -0.17% 446.13 131.04 3.40x qlinearops
YOLOv3 26.88% 28.74% -6.47% 157.39 66.72 2.36x qlinearops
YOLOv4 33.18% 33.71% -1.57% 58.55 38.09 1.54x qlinearops
ZFNet 55.89% 55.96% -0.13% 664.37 358.62 1.85x qlinearops
ZFNet 55.89% 55.96% -0.13% 666.99 354.38 1.88x qdq

MXNet models with MXNet 1.7.0

Model Accuracy Performance
throughput (samples/sec)
INT8 FP32 Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
Inception V3 77.80% 77.65% 0.20% 920.74 276.73 3.33x
MobileNet V1 71.60% 72.23% -0.86% 6585.19 2529.21 2.60x
MobileNet V2 70.80% 70.87% -0.10% 5230.32 1996.47 2.62x
ResNet V1 152 78.28% 78.54% -0.33% 574.85 156.2 3.68x
ResNet50 V1.0 75.91% 76.33% -0.55% 1567.9 427.99 3.66x
SqueezeNet 56.80% 56.97% -0.28% 4704.51 1332.29 3.53x
SSD MobileNet V1 74.94% 75.54% -0.79% 769.26 193.03 3.99x

Validated Pruning Examples

Tasks Framework Model FP32 Baseline Gradient Sensitivity with 20% Sparsity +ONNX Dynamic Quantization on Pruned Model
Accuracy% Drop Perf Gain (sample/s) Accuracy% Drop Perf Gain (sample/s)
SST-2 PyTorch BERT base accuracy = 92.32 accuracy = 91.97 -0.38 1.30x accuracy = 92.20 -0.13 1.86x
QQP PyTorch BERT base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [89.97, 86.54] [-1.24, -1.71] 1.32x [accuracy, f1] = [89.75, 86.60] [-1.48, -1.65] 1.81x
Tasks Framework Model FP32 Baseline Pattern Lock on 70% Unstructured Sparsity Pattern Lock on 50% 1:2 Structured Sparsity
Accuracy% Drop Accuracy% Drop
MNLI PyTorch BERT base [m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27] [-2.51, -1.80] [m, mm] = [83.20, 84.11] [-1.62, -0.80]
SST-2 PyTorch BERT base accuracy = 92.32 accuracy = 91.51 -0.88 accuracy = 92.20 -0.13
QQP PyTorch BERT base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06] [-0.68, -1.12] [accuracy, f1] = [90.92, 87.78] [-0.20, -0.31]
QNLI PyTorch BERT base accuracy = 91.54 accuracy = 90.39 -1.26 accuracy = 90.87 -0.73
QnA PyTorch BERT base [em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75] [-2.61, -1.54] [em, f1] = [78.03, 86.50] [-1.65, -0.69]
Framework Model FP32 Baseline Compression Dataset Accuracy% (Drop)
PyTorch ResNet18 69.76 30% Sparsity on Magnitude ImageNet 69.47(-0.42)
PyTorch ResNet18 69.76 30% Sparsity on Gradient Sensitivity ImageNet 68.85(-1.30)
PyTorch ResNet50 76.13 30% Sparsity on Magnitude ImageNet 76.11(-0.03)
PyTorch ResNet50 76.13 30% Sparsity on Magnitude and Post Training Quantization ImageNet 76.01(-0.16)
PyTorch ResNet50 76.13 30% Sparsity on Magnitude and Quantization Aware Training ImageNet 75.90(-0.30)

Validated Knowledge Distillation Examples

Example Name Dataset Student
(Accuracy)
Teacher
(Accuracy)
Student With Distillation
(Accuracy Improvement)
ResNet example ImageNet ResNet18
(0.6739)
ResNet50
(0.7399)
0.6845
(0.0106)
BlendCNN example MRPC BlendCNN
(0.7034)
BERT-Base
(0.8382)
0.7034
(0)
BiLSTM example SST-2 BiLSTM
(0.7913)
RoBERTa-Base
(0.9404)
0.8085
(0.0172)