SSD MobileNet Inference using ArmNN on i.MX8qmmek

ullasbharadwaj · ‎06-24-2020

Hello Community,

I am using i.MX8qmmek with BSP 5.4.3_2.0.0. I have my custom C++ application for running inference using TfLite and OpenCV. The appliaction with TfLite was able to use the GPU acceleration. Now, I would like to use ArmNN as my inference engine.

However, the Linux user guide does not suggest the example with SSD mobilenet inference. When I tried using the TfLiteMobileNetSsd-Armnn demo appliaction, i get the following error

ArmNN v20190800
Failed to parse operator #0 within subgraph #0 error: Operator not supported. subgraph:0 operator:0 opcode_index:3 opcode:6 / DEQUANTIZE at function ParseUnsupportedOperator [/usr/src/debug/armnn/19.08-]
Armnn Error: Buffer #176 has 0 bytes. For tensor: [1,300,300,3] expecting: 1080000 bytes and 270000 elements. at function CreateConstTensor [/usr/src/debug/armnn/19.08-r1/git/src/armnnTfLiteParser/TfLit]

So, is it possible to run any of SSD MobileNet models using ARM NN on GPU? Is there a sample code to do that?

Best Regards

Ullas Bharadwaj

manish_bajaj · ‎07-06-2020

ullasbharadwaj‌,

Current BSP version support ARMNN 19.08.

ARMNN 20.02 TfLite Parser: Added support for DEQUANTIZE.

I will update once our BSP moves to 20.02.

-Manish

View solution in original post

manish_bajaj · ‎07-10-2020

ullasbharadwaj‌,

As per our Alifer test, We don't see same number as you are seeing. Can you confirm that you using latest BSP released? Are you running same test as run by us?

You data Model: mobilenet_v1_0.25_128_quant.tflite
ArmNN ---> 5.20 ms

TfLite ---> 2.25 ms

root@imx8qmmek:~/armnntest# TfLiteMobileNetQuantizedSoftmax-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 2.464 ms

Model: mobilenet_v1_1.0_224_quant.tflite
ArmNN ---> 14.23 ms

TfLite ---> 12.28 ms

root@imx8qmmek:~# TfLiteMobilenetQuantized-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 12.214 ms

Performance difference between ARMNN and TFLite run time can be attributed based on various parameter. List of operator supported by run time environment, current Version supported etc.

-Manish

ullasbharadwaj · ‎07-13-2020

Hi manishbajaj,

I am using BSP version 5.4.3_2.0.0. Please find the attached screenshots of the inference times on the target (imx8qmmek). I am not getting your numbers with ArmNN.

So can we say, TfLite is best optimized compared to ArmNN? Or it is dependant on the model always?

Best Regards

Ullas Bharadwaj

manish_bajaj · ‎07-13-2020

ullasbharadwaj‌,

We don't see same number as you are seeing. TFLite performance might be bit better then ARMNN and might depend on model and supported version of (TFLite, ARMNN ) too.

I will suggest to try new version of BSP 5.4.24 too.

-Manish

ullasbharadwaj · ‎07-13-2020

Thank you. I will give it a try too.

ullasbharadwaj · ‎07-07-2020

Hi manishbajaj,

As per your suggestion, I have dropped to use SSD MobileNet but now trying to use just the MobileNet variant.

When I compare TfLite and ArmNN, with CPU Acc, ArmNN performs better than TfLite. On GPU, TfLite performs better than ArmNN.

Do you know if it is the expected behavior? Since GPU acc for both to TfLite and ArmNN is added in recent releases, is there some optimizations pending from NXP?

manish_bajaj · ‎07-07-2020

ullasbharadwaj‌,

Share your model and numbers you are seeing. Performance need not be same on different inference engine. There are various factor that can cause the difference.

-Manish

ullasbharadwaj · ‎07-08-2020

manishbajaj‌

Model: mobilenet_v1_0.25_128_quant (attached)

Results:

ArmNN : Cpu Acc -> 3.105ms

ArmNN : VsiNpu Acc -> 3.203ms

This is indicating CpuAcc is better than VsiNpu Acc for ArmNN. No improvement with VsiNpu for this model. But for mobilenet_v1_1.0_224_quant, results seems to improve with Vsi Npu (13ms) compared to CpuAcc (52ms).

TfLitte: Cpu Acc -> 3.6ms

TfLite: VsiNpu -> 1.8ms

This is indicating VsiNpu Acc is better than CpuAcc for TfLite

Question: Why does TfLite performs better than ArmNN on GPU and not on CPU?

manish_bajaj · ‎07-08-2020

nxf60449‌,

Can you look into it and update the ticket?

-Manish

Alifer_Moraes · ‎07-08-2020

Hello ullasbharadwaj‌,

I've ran some tests using the same models you attached on iMX8QM Mek using BSP 5.4.24-2.1.0 (newest).
It seems like each Tflite-Armnn example uses a specific model, so the examples that uses the models you shared are:

TfLiteMobileNetQuantizedSoftmax-Armnn -> model: mobilenet_v1_0.25_128_quant.tflite
TfLiteMobilenetQuantized-Armnn -> model: mobilenet_v1_1.0_224_quant.tflite

Are they the same you tried?

In the tests I ran, the inference time using VsiNpu was always faster than CpuRef, I got the following logs (I attached the complete log):

root@imx8qmmek:~# TfLiteMobilenetQuantized-Armnn -m model/ -d data/ -c CpuAcc
ArmNN v20190801
Average time per test case: 139.598 ms

root@imx8qmmek:~# TfLiteMobilenetQuantized-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 12.214 ms

root@imx8qmmek:~# TfLiteMobileNetQuantizedSoftmax-Armnn -m model/ -d data/ -c CpuAcc
ArmNN v20190801
Average time per test case: 11.714 ms
root@imx8qmmek:~/armnntest# TfLiteMobileNetQuantizedSoftmax-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 2.464 ms

-Alifer

ullasbharadwaj · ‎07-09-2020

Hi nxf60449,

Thanks for taking your time to run these tests.

Yes, the models I used are the same as you mentioned.

The results I mentioned are obtained by running them on dual A72 cores using "taskset -c 4-5". I am sorry that I missed out to mention the detail.

So I ran similar tests as you without using Taskset. The results I got are as follows:

*************************************************

Acceleration : CpuAcc

*************************************************

Model: mobilenet_v1_1.0_224_quant.tflite
ArmNN ---> 138.2 ms

TfLite ---> 116.6 ms

Model: mobilenet_v1_0.25_128_quant.tflite
ArmNN ---> 9.338 ms

TfLite ---> 6.44ms

*************************************************

Acceleration : VsiNpu

*************************************************

Model: mobilenet_v1_1.0_224_quant.tflite
ArmNN ---> 14.23 ms

TfLite ---> 12.28 ms

Model: mobilenet_v1_0.25_128_quant.tflite
ArmNN ---> 5.20 ms

TfLite ---> 2.25 ms

**************************************************

1. When you do not run them exclusievely on A72 cores, you can see improvement in performance with VsiNpu.

But on A72 with mobilenet_v1_0.25_128_quant.tflite, I do not see improvement with VsiNpu using ArmNN. May I know if you can also see this behavior and probably know the cause?

2. However, TfLite Interpreter always performs better than ArmNN. Should not ArmNN be more optimized compared to TfLite?

Maybe I am wrong, I believed ArmNN to perform better than TfLite.

Best Regards

manish_bajaj · ‎06-30-2020

ullasbharadwaj‌,

Can you please share information on your product use case? Please share model and exact command you used.

As indicated we are adding sample example using PYeIQ, New release should have an example for ARMNN too.

-Manish

ullasbharadwaj · ‎06-30-2020

Hi manishbajaj,

I am evaluating multiple object detection models using TfLite, Arm NN and OpenCV. Hence I am trying to use the ArmNN sample C++ application.

I am using TfLiteMobileNetSsd-Armnn (maybe I am wrong with exact spelling) sample application.

$: TfLiteMobileNetSsd-Armnn -m /path_to_any_ssd_mobilenet_model/ -d /path_to_test_images/ -c VsiNpu -l labels.txt

Can you please confirm this if I can run the sample application on GPU?

Best Regards

Ullas Bharadwaj

manish_bajaj · ‎07-02-2020

ullasbharadwaj‌

Are you able to run above example on CPU/Core ?

-Manish

ullasbharadwaj · ‎07-06-2020

No, I was not able to run it even on CPU.

manish_bajaj · ‎07-06-2020

ullasbharadwaj‌,

Current BSP version support ARMNN 19.08.

ARMNN 20.02 TfLite Parser: Added support for DEQUANTIZE.

I will update once our BSP moves to 20.02.

-Manish

ullasbharadwaj · ‎07-06-2020

Yes, I got it. Please update here. Thank you :-)

SSD MobileNet Inference using ArmNN on i.MX8qmmek

SSD MobileNet Inference using ArmNN on i.MX8qmmek

i.MX 8