Skip to content

Latest commit

 

History

History
32 lines (28 loc) · 861 Bytes

README.md

File metadata and controls

32 lines (28 loc) · 861 Bytes

vector_by_onnxmodel

accelerate generating vector by using onnx model

install

conda create -n vo python=3.10
pip install -r requirements.txt
pip install optimum[onnxruntime-gpu]

how to use

python generate.py
python generate_optimum.py

result(~4x faster)

# you can see the inference time of onnx model is much faster than using sentence_transformers
# used model: https://huggingface.co/amu/tao-8k
OnnxModel Runtime gpu Inference time = 4.52 ms
Sentence Transformer gpu Inference time = 22.19 ms

result[optimum](~5x faster)

# you can see the inference time of onnx model is much faster than using sentence_transformers
# used model: https://huggingface.co/amu/tao-8k
# On one A100 GPU
[Optimum] OnnxModel Runtime gpu Inference time = 3.22 ms
Sentence Transformer gpu Inference time = 17.63 ms