-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to execute GPT-2 onnx model #783
Comments
Hi, The fatal error message indicates that there are INT64 types in the model. Our Onnx parser does not support this data type. Our ONNX parser is very outdated and has been marked for future deprecation. So unless you're willing to contribute the work yourself, I'm afraid this model won't work. Colm. |
@Colm-in-Arm : Does tf-lite support INT64 types or do you recommend any other parsers? My objective is to run inferencing on any of the LLM model on Mali G710 GPU utilizing ExecuteNetwork. Any successful use case available ? If so, kindly can you direct me to the working model ? |
Hi, TfLite runtime does support INT64 in some limited cases. I don't know of other ONNX runtimes you could use. In Arm NN we have not done any work on LLM's. The work I have seen tends to target the CPU rather than GPU. LLM's tend to be memory bound rather than CPU bound so there's not as much potential for performance increase using GPU's. Colm. |
@Colm-in-Arm : Like to inform you that I am able to successfully execute GPT2 tflite model on Mali G710 GPU. The "gpt2-64-fp16.tflite" model worked. Now ARM can add LLM in to their portfolio :) |
Wow! Well done. Can you outline the steps need to make the model small enough to push through Arm NN? Did you use ExecuteNetwork or your own application? I presume some layers were handled by the TfLite runtime? What kind of inference times were you getting? How about CpuAcc did you try it? Colm. |
@Colm-in-Arm : I haven't reduced the size of the model. The file size of "gpt2-64-fp16.tflite" was 248 mb. Like to know why we need to reduce the model size ? Yes, I have used ExecuteNetwork with both CpuAcc & GpuAcc runtimes. CpuAcc - 25 mins and GpuAcc - 2 hours 30 mins (approx). Note: I am executing the model on a Hybrid emulated platform (using Zebu), where the Cpu is on the virtual side and the Gpu runs on the RTL side. So it is not straight forward to compare the execution times. Somelayers were handled by TfLite runtime ? How do we verify this? You mean few of the unsupported operations are handled by tflite runtime ? Also, I would like to check the Gpu core, memory & power consumption. Is there any commands that I can execute from the Linux terminal to check the same ? @Colm-in-Arm : Awaiting for response |
Hello Team,
I am trying to execute the gpt-2 model (link given below) on Mali G710 GPU. During the execution I get the below error,
./ExecuteNetwork -c GpuAcc -f onnx-binary -d /mnt/dropbox/MobileNetV2/llm.txt
-m /mnt/dropbox/LLM/gpt2-10.onnx -i input1 -s 1,4,16
Warning: DEPRECATED: The program option 'input-name' is deprecated and will be removed soon. The input-names are now automatical
ly set.
Warning: DEPRECATED: The program option 'model-format' is deprecated and will be removed soon. The model-format is now automatica
lly set.
Info: ArmNN v33.1.0
Info: Initialization time: 298.10 ms.
Fatal: Datatype INT64 is not valid for tensor 'input1' of node 'Reshape_11', not in {onnx::TensorProto::FLOAT}. at function Pars
eReshape [/devenv/armnn/src/armnnOnnxParser/OnnxParser.cpp:2319]
Info: Shutdown time: 129.43 ms.
model link: https://github.com/onnx/models/blob/main/validated/text/machine_comprehension/gpt-2/model/gpt2-10.onnx
@FrancisMurtagh-arm: I tried passing both int and float values as input, but still did not help. Can you please suggest a fix.
The text was updated successfully, but these errors were encountered: