Skip to content

Latest commit

 

History

History
 
 

Step-by-Step

This document describes the step-by-step instructions for reproducing PyTorch PeleeNet tuning results with Intel® Neural Compressor.

Note

  • PyTorch quantization implementation in imperative path has limitation on automatically execution. It requires to manually add QuantStub and DequantStub for quantizable ops, it also requires to manually do fusion operation.
  • Intel® Neural Compressor supposes user have done these two steps before invoking Intel® Neural Compressor interface. For details, please refer to https://pytorch.org/docs/stable/quantization.html

Prerequisite

1. Installation

# Install
cd examples/pytorch/image_recognition/peleenet/quantization/ptq/eager
pip install -r requirements.txt

2. Prepare Dataset

Download ImageNet Raw image to dir: /path/to/imagenet. The dir include below folder:

ls /path/to/imagenet
train  val

3. Prepare pretrained model

Download weights to examples/pytorch/image_recognition/peleenet/quantization/ptq/eager/weights.

Run

cd examples/pytorch/image_recognition/peleenet/quantization/ptq/eager
python main.py --tune --pretrained -j 1 /path/to/imagenet

Examples of enabling Intel® Neural Compressor auto tuning on PyTorch ResNet

This is a tutorial of how to enable a PyTorch classification model with Intel® Neural Compressor.

User Code Analysis

Intel® Neural Compressor supports three usages:

  1. User only provide fp32 "model", and configure calibration dataset, evaluation dataset and metric in model-specific yaml config file.
  2. User provide fp32 "model", calibration dataset "q_dataloader" and evaluation dataset "eval_dataloader", and configure metric in tuning.metric field of model-specific yaml config file.
  3. User specifies fp32 "model", calibration dataset "q_dataloader" and a custom "eval_func" which encapsulates the evaluation dataset and metric by itself.

As PeleeNet are typical classification models, use Top-K as metric and imagenet dataset which are built-in supported by Intel® Neural Compressor. So here we integrate PyTorch PeleeNet with Intel® Neural Compressor by the first use case for simplicity.

Write Yaml Config File

In examples directory, there is a template.yaml. We could remove most of the items and only keep mandatory item for tuning.

#conf.yaml
model:                                               # mandatory. used to specify model specific information.
  name: peleenet
  framework: pytorch                                 # mandatory. supported values are tensorflow, pytorch, pytorch_ipex, onnxrt_integer, onnxrt_qlinear or mxnet; allow new framework backend extension.

quantization:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
  calibration:
    sampling_size: 256                               # optional. default value is 100. used to set how many samples should be used in calibration.
    dataloader:
      batch_size: 256
      dataset:
        ImageFolder:
          root: /path/to/calibration/dataset         # NOTE: modify to calibration dataset location if needed
      transform:
        RandomResizedCrop:
            size: 224
        RandomHorizontalFlip:
        ToTensor:
        Normalize:
            mean: [0.485, 0.456, 0.406]
            std: [0.229, 0.224, 0.225]

evaluation:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
  accuracy:                                          # optional. required if user doesn't provide eval_func in neural_compressor.Quantization.
    metric:
      topk: 1                                        # built-in metrics are topk, map, f1, allow user to register new metric.
    dataloader:
      batch_size: 256
      dataset:
        ImageFolder:
          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
      transform:
        Resize:
          size: 256
        CenterCrop:
          size: 224
        ToTensor:
        Normalize:
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
  performance:                                       # optional. used to benchmark performance of passing model.
    configs:
      cores_per_instance: 4
      num_of_instance: 7
    dataloader:
      batch_size: 1
      dataset:
        ImageFolder:
          root: /path/to/evaluation/dataset          # NOTE: modify to evaluation dataset location if needed
      transform:
        Resize:
          size: 256
        CenterCrop:
          size: 224
        ToTensor:
        Normalize:
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]

tuning:
  accuracy_criterion:
    relative:  0.01                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
  exit_policy:
    timeout: 0                                       # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
  random_seed: 9527                                  # optional. random seed for deterministic tuning.

Here we choose topk built-in metric and set calibration dataloader, evaluation dataloader and accuracy target as tolerating 0.01 relative accuracy loss of baseline. The default tuning strategy is basic strategy. The timeout 0 means unlimited time as well as a tuning config meet accuracy target.

Prepare

PyTorch quantization requires two manual steps:

  1. Add QuantStub and DeQuantStub for all quantizable ops.
  2. Fuse possible patterns, such as Conv + Relu and Conv + BN + Relu.

It's intrinsic limitation of PyTorch quantization imperative path. No way to develop a code to automatically do that.

The related code changes please refer to examples/pytorch/image_recognition/peleenet/quantization/ptq/eager/peleenet.py.

Code Update

After prepare step is done, we just need to update main.py like below.

model.module.fuse()
from neural_compressor.experimental import Quantization, common
quantizer = Quantization("./conf.yaml")
quantizer.model = common.Model(model)
q_model = quantizer.fit()

The quantizer.fit() function will return a best quantized model during timeout constrain.