accelerate generating vector by using onnx model
conda create -n vo python=3.10
pip install -r requirements.txt
pip install optimum[onnxruntime-gpu]
python generate.py
python generate_optimum.py
# you can see the inference time of onnx model is much faster than using sentence_transformers
# used model: https://huggingface.co/amu/tao-8k
OnnxModel Runtime gpu Inference time = 4.52 ms
Sentence Transformer gpu Inference time = 22.19 ms
# you can see the inference time of onnx model is much faster than using sentence_transformers
# used model: https://huggingface.co/amu/tao-8k
# On one A100 GPU
[Optimum] OnnxModel Runtime gpu Inference time = 3.22 ms
Sentence Transformer gpu Inference time = 17.63 ms