Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to execute GPT-2 onnx model #783

Open
somasundaram1702 opened this issue Jul 18, 2024 · 6 comments
Open

Unable to execute GPT-2 onnx model #783

somasundaram1702 opened this issue Jul 18, 2024 · 6 comments

Comments

@somasundaram1702
Copy link

somasundaram1702 commented Jul 18, 2024

Hello Team,

I am trying to execute the gpt-2 model (link given below) on Mali G710 GPU. During the execution I get the below error,

./ExecuteNetwork -c GpuAcc -f onnx-binary -d /mnt/dropbox/MobileNetV2/llm.txt
-m /mnt/dropbox/LLM/gpt2-10.onnx -i input1 -s 1,4,16
Warning: DEPRECATED: The program option 'input-name' is deprecated and will be removed soon. The input-names are now automatical
ly set.
Warning: DEPRECATED: The program option 'model-format' is deprecated and will be removed soon. The model-format is now automatica
lly set.
Info: ArmNN v33.1.0
Info: Initialization time: 298.10 ms.
Fatal: Datatype INT64 is not valid for tensor 'input1' of node 'Reshape_11', not in {onnx::TensorProto::FLOAT}. at function Pars
eReshape [/devenv/armnn/src/armnnOnnxParser/OnnxParser.cpp:2319]
Info: Shutdown time: 129.43 ms.

model link: https://github.com/onnx/models/blob/main/validated/text/machine_comprehension/gpt-2/model/gpt2-10.onnx

@FrancisMurtagh-arm: I tried passing both int and float values as input, but still did not help. Can you please suggest a fix.

@Colm-in-Arm
Copy link
Collaborator

Hi,

The fatal error message indicates that there are INT64 types in the model. Our Onnx parser does not support this data type. Our ONNX parser is very outdated and has been marked for future deprecation. So unless you're willing to contribute the work yourself, I'm afraid this model won't work.

Colm.

@somasundaram1702
Copy link
Author

somasundaram1702 commented Jul 22, 2024

@Colm-in-Arm : Does tf-lite support INT64 types or do you recommend any other parsers? My objective is to run inferencing on any of the LLM model on Mali G710 GPU utilizing ExecuteNetwork. Any successful use case available ? If so, kindly can you direct me to the working model ?

@Colm-in-Arm
Copy link
Collaborator

Hi,

TfLite runtime does support INT64 in some limited cases. I don't know of other ONNX runtimes you could use.

In Arm NN we have not done any work on LLM's. The work I have seen tends to target the CPU rather than GPU. LLM's tend to be memory bound rather than CPU bound so there's not as much potential for performance increase using GPU's.

Colm.

@somasundaram1702
Copy link
Author

@Colm-in-Arm : Like to inform you that I am able to successfully execute GPT2 tflite model on Mali G710 GPU. The "gpt2-64-fp16.tflite" model worked.

Now ARM can add LLM in to their portfolio :)

@Colm-in-Arm
Copy link
Collaborator

Wow! Well done.

Can you outline the steps need to make the model small enough to push through Arm NN? Did you use ExecuteNetwork or your own application? I presume some layers were handled by the TfLite runtime? What kind of inference times were you getting? How about CpuAcc did you try it?

Colm.

@somasundaram1702
Copy link
Author

somasundaram1702 commented Aug 2, 2024

@Colm-in-Arm : I haven't reduced the size of the model. The file size of "gpt2-64-fp16.tflite" was 248 mb. Like to know why we need to reduce the model size ? Yes, I have used ExecuteNetwork with both CpuAcc & GpuAcc runtimes. CpuAcc - 25 mins and GpuAcc - 2 hours 30 mins (approx).

Note: I am executing the model on a Hybrid emulated platform (using Zebu), where the Cpu is on the virtual side and the Gpu runs on the RTL side. So it is not straight forward to compare the execution times.

Somelayers were handled by TfLite runtime ? How do we verify this? You mean few of the unsupported operations are handled by tflite runtime ?

Also, I would like to check the Gpu core, memory & power consumption. Is there any commands that I can execute from the Linux terminal to check the same ?

@Colm-in-Arm : Awaiting for response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants