-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another error on another model and by now it's two out of three :( #762
Comments
another model with problems #758 |
Hello Federico, Unfortunately I'm not permitted to test using your model without knowing the source and license under which it is shared. However, from your description of the problem it appears that whichever backend you are targeting says the layer is valid but then subsequently prevents a workload being created to execute the layer. Can you tell me which backend you are trying to use? You could also try using the CpuRef backend. It will not be performant but if it runs the layer it will give a strong indication that it is the backend that's at fault. Colm. |
First of all, please be more precise about what exactly do you need in order to be able to execute the models I'm sharing here and in #758. In both cases these are open source models, the one about the current bug report is located here https://github.com/aselsan-research-imaging-team/flight-net and the one about #758 is located here https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b Regarding back end I tried both CpuAcc and GpuAcc, both separately, together (in both orders). Both the release for ARM64 as well as my own compilation of ARMNN do not support the CpuRef (at least not when run on an Orange Pi5B), so I can't test that unless you give me a different way. Please, check the links I just shared to confirm both models (both the one of this bug report and the one of my other bug report) are open licensed models and please try them yourselves. ARMNN when it works is really performant but if you can't make it work with more models then it will fall onto oblivion. Thank you, |
Hello Federico, Thank you for pointing me to the source of the models. I can see they are creative commons which is good. However, you converting them to tflite is considered a derivative work. Two options:
If you want to try CpuRef you can find a binary release of Arm NN for aarch64 platforms that includes the CpuRef here Colm. |
Hi, thank you for the detail. I want to assert here that the converted tflite model I shared above, which is based on https://github.com/aselsan-research-imaging-team/flight-net, was shared here under license CC BY-NC-SA 4.0 I'll go now to the other bug report (#758) and do the same so that you can test that model as well. I'll try the CpuRef ASAP and report back as well. All my best, |
Hello Federico, I tried feather_32bit.tflite and it shows the same ComparisonQueueDescriptor error in CpuRef as you saw with CpuAcc. I'll start investigating it now. Colm. |
Hello Federico, I have a review up to resolve the first problem I encountered: https://review.mlplatform.org/c/ml/armnn/+/11379 You can cherry pick this patch on top of Arm NN main branch if you want to experiment with it. The fault came down to inconsistent handling of broadcast in the Greater_Equal layer. I have verified byte level accuracy of results between TfLite runtime and CpuRef backends. However, there appears to be a further problem with the CpuAcc and GpuAcc backends that I'm investigating now. Colm. |
Hello Federico, I've resolved the problems in CpuAcc and GpuAcc now. This model contains kTfLiteBool which is a data type we don't often encounter. I've updated the review, https://review.mlplatform.org/c/ml/armnn/+/11379, to include all the necessary changes. Colm. |
Amazing! I'll try this today! Please look at the other case #758 whenever you have the time, I added the license for that model as well in the thread. |
so weird! I applied the patch on the latest pull of the main branch of armnn, using git apply 48eefee.diff (after downloading your patch), I checked the files locally to see the patch was indeed applied - it was; I then went to the build tool script and built armnn, and... I get exactly the same error somehow! RuntimeError: TfLiteArmnnDelegate: Exception (TfLiteArmnnDelegate: Network could not be loaded: An error occurred when preparing the network workloads: ComparisonQueueDescriptor: Tensors input_0 & input_1 must have the same number of dimensions in order to be broadcasted) caught from LoadNetwork. what could it possibly be? I'm sure the build tool is somehow not using the patched files... |
You do need to check your build. The change to AddBroadcastReshapeLayer.hpp will resolve that specific error. Colm. |
Oh as I said I applied the patch and then built, so all the files including the one you just mentioned have all the changes you made. Could it be that the build tool accept uses a specific branch or uses another source? Otherwise I'm lost :( |
Running build-armnn.sh should reuse the version of Arm NN previously cloned by setup-armnn.sh. Any changes you've made to the cloned repository should be built. I've no idea why you're not seeing the changes. If you deliberately break the code and rebuild does the build fail? You could also try the --clean option to force a clean build? Colm. |
ok, I see the issue now - I was cloning the main branch then applying the patch then going to its build-tool/script directory and running setup-armnn and build-armnn with the understanding it would use the version of the source code that was already cloned - I haven't noticed a new source folder was created by setup-armnn and that it was THAT folder that needed patching. I applied the patch to that folder after doing a pull and then built and...I still have errors :( Using this python code to call it, using the feather_16bit.tflite model armnn_delegate = tflite.experimental.load_delegate( library="/home/federico/Documents/code/ARM/aarch64_build/delegate/libarmnnDelegate.so", Delegates/Executes all operations supported by Arm NN to/with Arm NNinterpreter = tflite.Interpreter(model_path="../models/feather_16bit.tflite", experimental_delegates=[armnn_delegate]) This are the outputs (log level trace) with each backend used independently: GpuAcc: Error: ArmNN Failed to visit node with error: in data_size_from_type ./arm_compute/core/utils/DataTypeUtils.h:67: Invalid data type Info: ArmnnSubgraph creation Info: ConvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer() and then:RuntimeError Traceback (most recent call last) File ~/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tensorflow/lite/python/interpreter.py:513, in Interpreter.init(self, model_path, model_content, experimental_delegates, num_threads, experimental_op_resolver_type, experimental_preserve_all_tensors, experimental_disable_delegate_clustering) RuntimeError: TfLiteArmnnDelegate: Exception (Failed to assign a backend to each layer) caught from optimize. onvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer() Info: Optimize ArmnnSubgraph time: 3.50 ms Info: ArmnnSubgraph creation Warning: WARNING: Layer of type Convolution2d is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: ArmNN ClDepthwiseConv2dWorkload does not support non constant bias.), falling back to the next backend. CpuAcc: Error: ArmNN Failed to visit node with error: in data_size_from_type ./arm_compute/core/utils/DataTypeUtils.h:67: Invalid data type Info: ArmnnSubgraph creation Info: ConvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer() INFO: TfLiteArmnnDelegate: Added backend CpuAcc RuntimeError Traceback (most recent call last) File ~/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tensorflow/lite/python/interpreter.py:513, in Interpreter.init(self, model_path, model_content, experimental_delegates, num_threads, experimental_op_resolver_type, experimental_preserve_all_tensors, experimental_disable_delegate_clustering) RuntimeError: TfLiteArmnnDelegate: Exception (Failed to assign a backend to each layer) caught from optimize. type:Float32 Info: Optimize ArmnnSubgraph time: 3.51 ms Info: ArmnnSubgraph creation Warning: WARNING: Layer of type Convolution2d is not supported on requested backend CpuAcc for input data type Float32 and output data type Float32 (reason: in validate src/runtime/NEON/functions/NEConvolutionLayer.cpp:134: Dynamic weights are not supported), falling back to the next backend. CpuRef: This one WORKS. Yet there are warnings: Info: ArmnnSubgraph creation Info: ConvertConstDequantisationLayersToConstLayersImpl::ReplaceConstDequantisationLayer() Info: Optimize ArmnnSubgraph time: 3.32 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.46 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.49 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 1.25 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.29 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.58 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.30 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.44 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.31 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.54 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.52 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.45 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.30 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.50 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.53 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.47 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.30 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.21 ms Debug: RuntimeImpl::UnloadNetwork(): Unloaded network with ID: 19 WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type WARNING: CAST: not supported by armnn: Reference cast: input is not a supported type Again CpuRef even with all these messages, does work. CpuAcc and GpuAcc do not work. |
Using instead the feather_8bit_dynamic.tflite, both CpuAcc and CpuRef work (and indeed CpuAcc works pretty fast on my orange pi 5b !). But GpuAcc still has the errors: Error: ArmNN Failed to visit node with error: in data_size_from_type ./arm_compute/core/utils/DataTypeUtils.h:67: Invalid data type Info: Optimize ArmnnSubgraph time: 0.51 ms Info: ArmnnSubgraph creation Info: Optimize ArmnnSubgraph time: 0.65 ms RuntimeError Traceback (most recent call last) File ~/miniconda3/envs/mlc-chat-venv/lib/python3.11/site-packages/tensorflow/lite/python/interpreter.py:513, in Interpreter.init(self, model_path, model_content, experimental_delegates, num_threads, experimental_op_resolver_type, experimental_preserve_all_tensors, experimental_disable_delegate_clustering) RuntimeError: TfLiteArmnnDelegate: Exception (TfLiteArmnnDelegate: Network could not be loaded: An error occurred when preparing the network workloads: Convolution2dQueueDescriptor: input & weight must have identical data types.) caught from LoadNetwork. |
Ok, the feather_32bit.tflite does work! with both GpuAcc, CpuAcc and CpuRef, great work! However, by comparing the speed of inference comparing the 32, 16 and 8 bits versions of the model using CpuRef (the only backend with which all 3 work) one can see that the 8 bit is much much faster - and so that presumably the 16 and 8 bits versions would also be faster if they would run (instead of crashing) on the CpuAcc and GpuAcc backends. So if you could check the errors, particularly those related to GpuAcc backend on the feather_8bit_dynamic.tflite version that would be so helpful! thank you! |
Hello Federico, Just a point on feather_16bit.tflite. This is not a FP16 model. The input is FP32. The weights are quantized FP16 with the first operator on them being dequantize to FP32. TfLite have the concept of post training quantization. It is up to the backend/accelerator to identify this structure and to fully modify the model to use FP16 kernels. I believe the TfLite GPU backend does support this but Arm NN does not. The result is that you will not see any performance increase between FP16 and FP32 with this model on Arm NN. Colm. |
Thanks @Colm-in-Arm . Three follow ups:
Thank you!!! Federico |
1: Yes Arm NN will accelerate an 8 bit quantized model using the GpuAcc and CpuAcc backends. The usual restrictions on supported layers apply. 2: I've never attempted the kinds of conversions you're doing. I would hope that Tensorflow would honour an FP16 input model and create a native FP16 tflite model. I would be happy to try it if you get the conversion to work. 3: Something is going wrong with the ACL layer validation here. It is returning that this CONV2D layer is supported but in Arm NN we have 3 different reasons why this workload should not be created. When I remove these restrictions the layer fails in ACL. I'll have to check with @morgolock feather_16bit.tflite on CpuAcc and GpuAcc I can see the model is executing but the results are garbage. This will require a layer by layer comparison which will take some time. Colm. |
Thank you! given the 16 bit model will not improve speed with regards to the 32 bit model, then the most important model version to make work is the 8bit version. As a general rule: what is faster in ARMNN (in mali GpuAcc)? 8bit models or 16/32 bit models? thanks! |
Hey @Colm-in-Arm good news is: I created a new version of the converted model, using these guidelines https://www.tensorflow.org/lite/performance/post_training_integer_quant#convert_using_integer-only_quantization (except that I did not quantize the inputs and outputs), leading to this model: The good news: ARMNN with your patch shared in this bug report does work with this 8bit model! It reports several errors both in GpuAcc and CpuAcc but it does work anyhow (errors below). Now the bad news: it's significantly slower than the float version. And it is slower with GpuAcc and faster with CpuAcc. This observation I have made before: I found several times that float models work faster than int models in ARMNN and that int models run faster in CpuAcc than GpuAcc, which I took to mean that int models were useless in the sense that Mali was not prepared to accelerate them or not as much as float operations. Can you confirm this? I'm trying to focus my effort on creating the fastest possible models for Mali GPU. Here are the errors I see with this model when loading with CpuAcc (first) and GpuAcc (second): CpuAcc errors: GpuAcc: I repeat, both still do work, they just work slower than the float versions on either backend |
Hello Federico, It's never as simple as INT8 is slower on GpuAcc than FP32. There are a multitude of factors involved. Something to consider: the first time a GPU inference happens the kernels are compiled. You should probably disregard the first inference. If you run 10 iterations, ignore the first and compare the execution to CpuAcc you'll get a better impression of the relative speeds. You can avoid this initial overhead by caching a previous tuning level data (see save-cached-network, cached-network-filepath, tuning-level and tuning-path options.) There is a script delivered with ExecuteNetwork, evaluate_network.sh, that will try to find the fastest way to run a model inference. From memory, it requires the model to be supported by the parser so not much use for this model. The most likely cause of the errors from ./arm_compute/core/utils/DataTypeUtils.h:67 is a Boolean Datatype. Part of the review was to allow the Boolean data type to propagate down to ACL for it to be rejected by the validate method. This is the main reason I've not progressed this review. I need to work on a better way to do this. Once the layer is rejected by ACL it will fall back to TfLite runtime. Colm. |
feather_lite.zip
any of the three versions of this very simple model (that enlightens images) works fine with the default interpreter but fail with the armnn delegate (last version) with error:
Error: An error occurred when preparing the network workloads: ComparisonQueueDescriptor: Tensors input_0 & input_1 must have the same number of dimensions in order to be broadcasted
error makes no sense as the model has only one input (an image, 400x600)
But more deeply I'm a little pissed with this armnn delegate: you advertise the library as falling back to the standard interpreter to support all ops - and yet this is the second model that I run that works fine with the standard interpreter yet fails with the ARMNN delegate.
Please! make it so that if there is an error in any part of the model running with ARMNN then that it runs that part with standard interpreter to really have full support!
I attached the models so that you can try them yourself.
The text was updated successfully, but these errors were encountered: