SSD MobileNet Inference using ArmNN on i.MX8qmmek

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
已解决

SSD MobileNet Inference using ArmNN on i.MX8qmmek

跳至解决方案
17,264 次查看
ullasbharadwaj
Contributor III

Hello Community,

I am using i.MX8qmmek with BSP 5.4.3_2.0.0. I have my custom C++ application for running inference using TfLite and OpenCV. The appliaction with TfLite was able to use the GPU acceleration. Now, I would like to use ArmNN as my inference engine.

However, the Linux user guide does not suggest the example with SSD mobilenet inference. When I tried using the TfLiteMobileNetSsd-Armnn demo appliaction, i get the following error 

ArmNN v20190800
Failed to parse operator #0 within subgraph #0 error: Operator not supported. subgraph:0 operator:0 opcode_index:3 opcode:6 / DEQUANTIZE at function ParseUnsupportedOperator [/usr/src/debug/armnn/19.08-]
Armnn Error: Buffer #176 has 0 bytes. For tensor: [1,300,300,3] expecting: 1080000 bytes and 270000 elements. at function CreateConstTensor [/usr/src/debug/armnn/19.08-r1/git/src/armnnTfLiteParser/TfLit]

So, is it possible to run any of SSD MobileNet models using ARM NN on GPU? Is there a sample code to do that?

Best Regards

Ullas Bharadwaj

标签 (1)
1 解答
16,731 次查看
manish_bajaj
NXP Employee
NXP Employee

ullasbharadwaj‌,

Current BSP version support ARMNN 19.08. 

ARMNN 20.02  TfLite Parser: Added support for DEQUANTIZE.

I will update once our BSP moves to 20.02.

-Manish

在原帖中查看解决方案

0 项奖励
16 回复数
16,731 次查看
manish_bajaj
NXP Employee
NXP Employee

ullasbharadwaj‌,

As per our Alifer test, We don't see same number as you are seeing. Can you confirm that you using latest BSP released? Are you running same test as run by us?

You data Model: mobilenet_v1_0.25_128_quant.tflite
ArmNN ---> 5.20 ms

TfLite ---> 2.25 ms

root@imx8qmmek:~/armnntest# TfLiteMobileNetQuantizedSoftmax-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case:
2.464 ms

 

Model: mobilenet_v1_1.0_224_quant.tflite
ArmNN ---> 14.23 ms

TfLite ---> 12.28 ms

root@imx8qmmek:~# TfLiteMobilenetQuantized-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 12.214 ms

Performance difference between ARMNN and TFLite run time can be attributed based on various parameter. List of operator supported by run time environment, current Version supported etc.

-Manish 

0 项奖励
16,722 次查看
ullasbharadwaj
Contributor III

Hi manishbajaj,

I am using BSP version 5.4.3_2.0.0. Please find the attached screenshots of the inference times on the target (imx8qmmek). I am not getting your numbers with ArmNN.

So can we say, TfLite is best optimized compared to ArmNN? Or it is dependant on the model always?

Best Regards

Ullas Bharadwaj

0 项奖励
16,720 次查看
manish_bajaj
NXP Employee
NXP Employee

ullasbharadwaj‌,

We don't see same number as you are seeing. TFLite performance might be bit better then ARMNN and might depend on model and supported version of (TFLite, ARMNN ) too. 

I will suggest to try new version of BSP 5.4.24 too.

-Manish

0 项奖励
16,720 次查看
ullasbharadwaj
Contributor III

Thank you. I will give it a try too.

0 项奖励
16,721 次查看
ullasbharadwaj
Contributor III

Hi manishbajaj,

   As per your suggestion, I have dropped to use SSD MobileNet but now trying to use just the MobileNet variant.

When I compare TfLite and ArmNN, with CPU Acc, ArmNN performs better than TfLite. On GPU, TfLite performs better than ArmNN.

Do you know if it is the expected behavior? Since GPU acc for both to TfLite and ArmNN is added in recent releases, is there some optimizations pending from NXP?

0 项奖励
16,721 次查看
manish_bajaj
NXP Employee
NXP Employee

ullasbharadwaj‌,

Share your model and numbers you are seeing. Performance need not be same on different inference engine. There are various factor that can cause the difference.

-Manish

0 项奖励
16,720 次查看
ullasbharadwaj
Contributor III

manishbajaj

Model: mobilenet_v1_0.25_128_quant (attached)

Results:

ArmNN : Cpu Acc -> 3.105ms

ArmNN : VsiNpu Acc -> 3.203ms

This is indicating CpuAcc is better than VsiNpu Acc for ArmNN. No improvement with VsiNpu for this model.  But for mobilenet_v1_1.0_224_quant, results seems to improve with Vsi Npu (13ms) compared to CpuAcc (52ms).

TfLitte: Cpu Acc ->  3.6ms

TfLite: VsiNpu -> 1.8ms

This is indicating VsiNpu Acc is better than CpuAcc for TfLite

Question: Why does TfLite performs better than ArmNN on GPU and not on CPU?

0 项奖励
16,720 次查看
manish_bajaj
NXP Employee
NXP Employee

nxf60449‌,

Can you look into it and update the ticket?

-Manish

0 项奖励
16,723 次查看
Alifer_Moraes
NXP Employee
NXP Employee

Hello ullasbharadwaj‌,


I've ran some tests using the same models you attached on iMX8QM Mek using BSP 5.4.24-2.1.0 (newest).
It seems like each Tflite-Armnn example uses a specific model, so the examples that uses the models you shared are:

TfLiteMobileNetQuantizedSoftmax-Armnn -> model: mobilenet_v1_0.25_128_quant.tflite
TfLiteMobilenetQuantized-Armnn -> model: mobilenet_v1_1.0_224_quant.tflite

Are they the same you tried?

In the tests I ran, the inference time using VsiNpu was always faster than CpuRef, I got the following logs (I attached the complete log):

root@imx8qmmek:~# TfLiteMobilenetQuantized-Armnn -m model/ -d data/ -c CpuAcc
ArmNN v20190801
Average time per test case: 139.598 ms

root@imx8qmmek:~# TfLiteMobilenetQuantized-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 12.214 ms

root@imx8qmmek:~# TfLiteMobileNetQuantizedSoftmax-Armnn -m model/ -d data/ -c CpuAcc
ArmNN v20190801
Average time per test case: 11.714 ms
root@imx8qmmek:~/armnntest# TfLiteMobileNetQuantizedSoftmax-Armnn -m model/ -d data/ -c VsiNpu
ArmNN v20190801
Average time per test case: 2.464 ms

-Alifer

0 项奖励
16,723 次查看
ullasbharadwaj
Contributor III

Hi nxf60449,

Thanks for taking your time to run these tests.

Yes, the models I used are the same as you mentioned.

The results I mentioned are obtained by running them on dual A72 cores using "taskset -c 4-5". I am sorry that I missed out to mention the detail. 

So I ran similar tests as you without using Taskset. The results I got are as follows:

*************************************************

Acceleration : CpuAcc

*************************************************

Model: mobilenet_v1_1.0_224_quant.tflite
ArmNN ---> 138.2 ms

TfLite ---> 116.6 ms

Model: mobilenet_v1_0.25_128_quant.tflite
ArmNN ---> 9.338 ms

TfLite ---> 6.44ms

 

*************************************************

Acceleration : VsiNpu

*************************************************

Model: mobilenet_v1_1.0_224_quant.tflite
ArmNN ---> 14.23 ms

TfLite ---> 12.28 ms

Model: mobilenet_v1_0.25_128_quant.tflite
ArmNN ---> 5.20 ms

TfLite ---> 2.25 ms

**************************************************

1.  When you do not run them exclusievely on A72 cores, you can see improvement in performance with VsiNpu.

But on A72 with mobilenet_v1_0.25_128_quant.tflite, I do not see improvement with VsiNpu using ArmNN. May I know if you can also see this behavior and probably know the cause? 

2.  However, TfLite Interpreter always performs better than ArmNN. Should not ArmNN be more optimized compared to TfLite?

Maybe I am wrong, I believed ArmNN to perform better than TfLite.

Best Regards

0 项奖励
16,723 次查看
manish_bajaj
NXP Employee
NXP Employee

ullasbharadwaj‌,

Can you please share information on your product use case? Please share model and exact command you used.

As indicated we are adding sample example using PYeIQ, New release should have an example for ARMNN too.

-Manish

0 项奖励
16,723 次查看
ullasbharadwaj
Contributor III

Hi manishbajaj,

I am evaluating multiple object detection models using TfLite, Arm NN and OpenCV. Hence I am trying to use the ArmNN sample C++ application.

I am using TfLiteMobileNetSsd-Armnn (maybe I am wrong with exact spelling) sample application.

$: TfLiteMobileNetSsd-Armnn -m /path_to_any_ssd_mobilenet_model/ -d /path_to_test_images/ -c VsiNpu -l labels.txt

Can you please confirm this if I can run the sample application on GPU?

Best Regards

Ullas Bharadwaj

0 项奖励
16,723 次查看
manish_bajaj
NXP Employee
NXP Employee

ullasbharadwaj

Are you able to run above example on CPU/Core ?

-Manish

0 项奖励
16,723 次查看
ullasbharadwaj
Contributor III

No, I was not able to run it even on CPU.

0 项奖励
16,732 次查看
manish_bajaj
NXP Employee
NXP Employee

ullasbharadwaj‌,

Current BSP version support ARMNN 19.08. 

ARMNN 20.02  TfLite Parser: Added support for DEQUANTIZE.

I will update once our BSP moves to 20.02.

-Manish

0 项奖励
16,723 次查看
ullasbharadwaj
Contributor III

Yes, I got it. Please update here. Thank you :-)