Skip to content

Latest commit

 

History

History
 
 

Evaluate performance of ONNX Runtime(DistilBERT)

ONNX runtime quantization is under active development. please use 1.6.0+ to get more quantization support.

This example load a language translation model and confirm its accuracy and speed based on GLUE data.

Environment

onnx: 1.7.0 onnxruntime: 1.6.0+

Prepare dataset

download the GLUE data with prepare_data.sh script.

export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC

bash prepare_data.sh --data_dir=$GLUE_DIR --task_name=$TASK_NAME

Prepare model

Please refer to Bert-GLUE_OnnxRuntime_quantization guide for detailed model export. The following is a simple example.

Use Huggingface Transformers to fine-tune the model based on the MRPC example with command like:

export OUT_DIR=/path/to/out_dir/
python ./run_glue.py \
    --model_type distilbert \
    --model_name_or_path distilbert-base-uncased \
    --task_name $TASK_NAME \
    --do_train \
    --do_eval \
    --do_lower_case \
    --data_dir $GLUE_DIR/$TASK_NAME \
    --max_seq_length 128 \
    --per_gpu_eval_batch_size=8   \
    --per_gpu_train_batch_size=8   \
    --learning_rate 2e-5 \
    --num_train_epochs 3.0 \
    --save_steps 100000 \
    --output_dir $OUT_DIR

Run the prepare_model.sh script

Usage:

cd examples/onnxrt/language_translation/distilbert/

bash prepare_model.sh --input_dir=$OUT_DIR \
                      --task_name=$TASK_NAME \
                      --output_model=path/to/model # model path as *.onnx

Quantization

Dynamic quantize:

bash run_tuning.sh --config=distilbert.yaml \ 
                   --input_model=path/to/model \ # model path as *.onnx
                   --output_model=path/to/model_tune

QDQ mode:

bash run_tuning.sh --config=distilbert_qdq.yaml \ 
                   --input_model=path/to/model \ # model path as *.onnx
                   --output_model=path/to/model_tune

Benchmark

bash run_benchmark.sh --config=distilbert.yaml \ 
                      --input_model=path/to/model \ # model path as *.onnx
                      --mode=performance # or accuracy