Hello Community,
I am using i.MX8qmmek with BSP 5.4.3_2.0.0. I have my custom C++ application for running inference using TfLite and OpenCV. The appliaction with TfLite was able to use the GPU acceleration. Now, I would like to use ArmNN as my inference engine.
However, the Linux user guide does not suggest the example with SSD mobilenet inference. When I tried using the TfLiteMobileNetSsd-Armnn demo appliaction, i get the following error
ArmNN v20190800
Failed to parse operator #0 within subgraph #0 error: Operator not supported. subgraph:0 operator:0 opcode_index:3 opcode:6 / DEQUANTIZE at function ParseUnsupportedOperator [/usr/src/debug/armnn/19.08-]
Armnn Error: Buffer #176 has 0 bytes. For tensor: [1,300,300,3] expecting: 1080000 bytes and 270000 elements. at function CreateConstTensor [/usr/src/debug/armnn/19.08-r1/git/src/armnnTfLiteParser/TfLit]
So, is it possible to run any of SSD MobileNet models using ARM NN on GPU? Is there a sample code to do that?
Best Regards
Ullas Bharadwaj
Solved! Go to Solution.
Current BSP version support ARMNN 19.08.
ARMNN 20.02 TfLite Parser: Added support for DEQUANTIZE.
I will update once our BSP moves to 20.02.
-Manish
As per our Alifer test, We don't see same number as you are seeing. Can you confirm that you using latest BSP released? Are you running same test as run by us?
You data Model: mobilenet_v1_0.25_128_quant.tflite
ArmNN ---> 5.20 ms
TfLite ---> 2.25 ms
root@imx8qmmek:~/armnntest# TfLiteMobileNetQuantizedSoftmax-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 2.464 ms
Model: mobilenet_v1_1.0_224_quant.tflite
ArmNN ---> 14.23 ms
TfLite ---> 12.28 ms
root@imx8qmmek:~# TfLiteMobilenetQuantized-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 12.214 ms
Performance difference between ARMNN and TFLite run time can be attributed based on various parameter. List of operator supported by run time environment, current Version supported etc.
-Manish
Hi manishbajaj,
I am using BSP version 5.4.3_2.0.0. Please find the attached screenshots of the inference times on the target (imx8qmmek). I am not getting your numbers with ArmNN.
So can we say, TfLite is best optimized compared to ArmNN? Or it is dependant on the model always?
Best Regards
Ullas Bharadwaj
We don't see same number as you are seeing. TFLite performance might be bit better then ARMNN and might depend on model and supported version of (TFLite, ARMNN ) too.
I will suggest to try new version of BSP 5.4.24 too.
-Manish
Thank you. I will give it a try too.
Hi manishbajaj,
As per your suggestion, I have dropped to use SSD MobileNet but now trying to use just the MobileNet variant.
When I compare TfLite and ArmNN, with CPU Acc, ArmNN performs better than TfLite. On GPU, TfLite performs better than ArmNN.
Do you know if it is the expected behavior? Since GPU acc for both to TfLite and ArmNN is added in recent releases, is there some optimizations pending from NXP?
Share your model and numbers you are seeing. Performance need not be same on different inference engine. There are various factor that can cause the difference.
-Manish
Model: mobilenet_v1_0.25_128_quant (attached)
Results:
ArmNN : Cpu Acc -> 3.105ms
ArmNN : VsiNpu Acc -> 3.203ms
This is indicating CpuAcc is better than VsiNpu Acc for ArmNN. No improvement with VsiNpu for this model. But for mobilenet_v1_1.0_224_quant, results seems to improve with Vsi Npu (13ms) compared to CpuAcc (52ms).
TfLitte: Cpu Acc -> 3.6ms
TfLite: VsiNpu -> 1.8ms
This is indicating VsiNpu Acc is better than CpuAcc for TfLite
Question: Why does TfLite performs better than ArmNN on GPU and not on CPU?
Hello ullasbharadwaj,
I've ran some tests using the same models you attached on iMX8QM Mek using BSP 5.4.24-2.1.0 (newest).
It seems like each Tflite-Armnn example uses a specific model, so the examples that uses the models you shared are:
TfLiteMobileNetQuantizedSoftmax-Armnn -> model: mobilenet_v1_0.25_128_quant.tflite
TfLiteMobilenetQuantized-Armnn -> model: mobilenet_v1_1.0_224_quant.tflite
Are they the same you tried?
In the tests I ran, the inference time using VsiNpu was always faster than CpuRef, I got the following logs (I attached the complete log):
root@imx8qmmek:~# TfLiteMobilenetQuantized-Armnn -m model/ -d data/ -c CpuAcc
ArmNN v20190801
Average time per test case: 139.598 ms
root@imx8qmmek:~# TfLiteMobilenetQuantized-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 12.214 ms
root@imx8qmmek:~# TfLiteMobileNetQuantizedSoftmax-Armnn -m model/ -d data/ -c CpuAcc
ArmNN v20190801
Average time per test case: 11.714 ms
root@imx8qmmek:~/armnntest# TfLiteMobileNetQuantizedSoftmax-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 2.464 ms
-Alifer
Hi nxf60449,
Thanks for taking your time to run these tests.
Yes, the models I used are the same as you mentioned.
The results I mentioned are obtained by running them on dual A72 cores using "taskset -c 4-5". I am sorry that I missed out to mention the detail.
So I ran similar tests as you without using Taskset. The results I got are as follows:
*************************************************
Acceleration : CpuAcc
*************************************************
Model: mobilenet_v1_1.0_224_quant.tflite
ArmNN ---> 138.2 ms
TfLite ---> 116.6 ms
Model: mobilenet_v1_0.25_128_quant.tflite
ArmNN ---> 9.338 ms
TfLite ---> 6.44ms
*************************************************
Acceleration : VsiNpu
*************************************************
Model: mobilenet_v1_1.0_224_quant.tflite
ArmNN ---> 14.23 ms
TfLite ---> 12.28 ms
Model: mobilenet_v1_0.25_128_quant.tflite
ArmNN ---> 5.20 ms
TfLite ---> 2.25 ms
**************************************************
1. When you do not run them exclusievely on A72 cores, you can see improvement in performance with VsiNpu.
But on A72 with mobilenet_v1_0.25_128_quant.tflite, I do not see improvement with VsiNpu using ArmNN. May I know if you can also see this behavior and probably know the cause?
2. However, TfLite Interpreter always performs better than ArmNN. Should not ArmNN be more optimized compared to TfLite?
Maybe I am wrong, I believed ArmNN to perform better than TfLite.
Best Regards
Can you please share information on your product use case? Please share model and exact command you used.
As indicated we are adding sample example using PYeIQ, New release should have an example for ARMNN too.
-Manish
Hi manishbajaj,
I am evaluating multiple object detection models using TfLite, Arm NN and OpenCV. Hence I am trying to use the ArmNN sample C++ application.
I am using TfLiteMobileNetSsd-Armnn (maybe I am wrong with exact spelling) sample application.
$: TfLiteMobileNetSsd-Armnn -m /path_to_any_ssd_mobilenet_model/ -d /path_to_test_images/ -c VsiNpu -l labels.txt
Can you please confirm this if I can run the sample application on GPU?
Best Regards
Ullas Bharadwaj
No, I was not able to run it even on CPU.
Current BSP version support ARMNN 19.08.
ARMNN 20.02 TfLite Parser: Added support for DEQUANTIZE.
I will update once our BSP moves to 20.02.
-Manish
Yes, I got it. Please update here. Thank you :-)