Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
bert_config.json		bert_config.json
bert_dynamic.yaml		bert_dynamic.yaml
bert_static.yaml		bert_static.yaml
create_squad_data.py		create_squad_data.py
eval_util.py		eval_util.py
gpu_environment.py		gpu_environment.py
modeling.py		modeling.py
prepare_dataset.sh		prepare_dataset.sh
prepare_model.sh		prepare_model.sh
requirements.txt		requirements.txt
run_benchmark.sh		run_benchmark.sh
run_engine.py		run_engine.py
run_tuning.sh		run_tuning.sh
tf_freeze_bert.py		tf_freeze_bert.py
tokenization.py		tokenization.py
utils.py		utils.py

README.md

Step-by-Step

Prerequisite

1.1 Install python environment

1. Installation

conda create -n <env name> python=3.7
conda activate <env name>
cd <nc_folder>/examples/baremetal/nlp/bert_large
pip install 1.15.0 up2 from links below:
https://storage.googleapis.com/intel-optimized-tensorflow/intel_tensorflow-1.15.0up2-cp37-cp37m-manylinux2010_x86_64.whl
pip install -r requirements.txt

Preload libiomp5.so can improve the performance when bs=1.

export LD_PRELOAD=<path_to_libiomp5.so>

Preloading libjemalloc.so can improve the performance. It has been built in third_party/jemalloc/lib.

export LD_PRELOAD=<path_to_libjemalloc.so>

2. Prepare Dataset and Model

2.1 Prepare Dataset

bash prepare_dataset.sh

Note: Replace the data path in bert_static.yaml and bert_dynamic.yaml if you change the data storage location

2.2 Download TensorFlow model (The model will be in build/data/bert_tf_v1_1_large_fp32_384_v2 folder):

bash prepare_model.sh

2.3 Get the frozen pb model (The model.pb will be in build/data):

python tf_freeze_bert.py

Run

1. To get the tuned model and its accuracy:

run python

GLOG_minloglevel=2 python run_engine.py --tune

or run shell

bash run_tuning.sh --config=bert_static.yaml --input_model=build/data/model.pb --output_model=ir --dataset_location=build/data

2. To get the benchmark of tuned model:

2.1 accuracy run python

GLOG_minloglevel=2 python run_engine.py --input_model=./ir --benchmark --mode=accuracy --batch_size=1

or run shell

bash run_benchmark.sh --config=bert_static.yaml --input_model=ir --dataset_location=build/data --batch_size=1 --mode=accuracy

2.2 performance run python

GLOG_minloglevel=2 python run_engine.py --input_model=./ir --benchmark --mode=performance --batch_size=1

or run shell

bash run_benchmark.sh --config=bert_static.yaml --input_model=ir --dataset_location=build/data --batch_size=1 --mode=performance

or run C++ The warmup below is recommended to be 1/10 of iterations and no less than 3.

export GLOG_minloglevel=2
export OMP_NUM_THREADS=<cpu_cores>
export DNNL_MAX_CPU_ISA=AVX512_CORE_AMX
numactl -C 0-<cpu_cores-1> <neural_compressor_folder>/engine/bin/inferencer --batch_size=<batch_size> --iterations=<iterations> --w=<warmup> --seq_len=384 --config=./ir/conf.yaml --weight=./ir/model.bin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert_large

bert_large

README.md

Step-by-Step

Prerequisite

1. Installation

2. Prepare Dataset and Model

2.1 Prepare Dataset

2.2 Download TensorFlow model (The model will be in build/data/bert_tf_v1_1_large_fp32_384_v2 folder):

2.3 Get the frozen pb model (The model.pb will be in build/data):

Run

1. To get the tuned model and its accuracy:

2. To get the benchmark of tuned model:

Files

bert_large

Directory actions

More options

Directory actions

More options

Latest commit

History

bert_large

Folders and files

parent directory

README.md

Step-by-Step

Prerequisite

1. Installation

2. Prepare Dataset and Model

2.1 Prepare Dataset

2.2 Download TensorFlow model (The model will be in build/data/bert_tf_v1_1_large_fp32_384_v2 folder):

2.3 Get the frozen pb model (The model.pb will be in build/data):

Run

1. To get the tuned model and its accuracy:

2. To get the benchmark of tuned model: