yolov5n-int8.tflite can not be run on the npu of I.MX8M Plus

hnu_lw · ‎11-22-2021

Dear NXP,

I'm trying to run a yolov5n on the i.mx8m's npu. The problem is that not matter what I'm trying to do, the model can run on the cpu, but not run on the npu.

I post are some details and my logs so hopefully someone can tell me what I'm doing wrong here.

root@imx8mpevk:/usr/bin/tensorflow-lite-2.4.1/examples# ./benchmark_model --graph=/home/yolov5test/yolov5n-int8-250.tflite --use_nnapi=true
STARTING!
Log parameter values verbosely: [0]
Graph: [/home/yolov5test/yolov5n-int8-250.tflite]
Use NNAPI: [1]
NNAPI accelerators available: [vsi-npu]
Loaded model /home/yolov5test/yolov5n-int8-250.tflite
INFO: Created TensorFlow Lite delegate for NNAPI.
WARNING: Operator RESIZE_NEAREST_NEIGHBOR (v3) refused by NNAPI delegate: NNAPI does not support half_pixel_centers == true.
WARNING: Operator RESIZE_NEAREST_NEIGHBOR (v3) refused by NNAPI delegate: NNAPI does not support half_pixel_centers == true.
Explicitly applied NNAPI delegate, and the model graph will be partially executed by the delegate w/ 3 delegate kernels.
The input model file size (MB): 2.16466
Initialized session in 36.587ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
Segmentation fault

Bio_TICFSL · ‎11-23-2021

Hello hny_lw,

Please clarify the following:

- what Yocto BSP is the customer using? Seems to be an older one since TfLite version is 2.4. Can they upgrade to newer versions?

- did the customer try vx_delegate (if it was available on their release)?

- can you share the details on how they obtained the Yolov5 tflite model or at least the model itself?

Regards

hnu_lw · ‎11-23-2021

Thanks for your response. I will try it.

The Yocto BSP is LF5.10.35_2.0.0

I use the new version(v6.0) of YOLOv5 in https://github.com/ultralytics/yolov5

They provide the TFLite, TF.js model export.

Can you also help me test it if it is convient?

Bio_TICFSL · ‎11-25-2021

Hi,

I was able to run successfully the yolo v5 tflite model on the NPU using BSP 5.10.52. See results below. Is it possible for the customer to move to this version?

./benchmark_model --graph=yolov5n-int8-250.tflite --use_nnapi=true

Inference timings in us: Init: 81565, First inference: 8031740, Warmup (avg): 8.03174e+06, Inference (avg): 74208.3
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Peak memory footprint (MB): init=4.16016 overall=69.3984

./benchmark_model --graph=yolov5n-int8-250.tflite --use_vxdelegate=true

Inference timings in us: Init: 5663, First inference: 12104657, Warmup (avg): 1.21047e+07, Inference (avg): 23733.7
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Peak memory footprint (MB): init=3.4375 overall=59.7578

Regards

dennis3 · ‎10-05-2022

Hello @Bio_TICFSL ,

I realize this post is fairly old, but it still seems relevant to our situation I'm hoping to resolve.

I can reproduce your benchmark results with nnapi as you posted when we use the Linux BSP. However I've not been able to reproduce any accelerated results on Android except with the older mobilenet model as described in the Tensorflow lite on Android User's Guide.

Why is it not possible to accelerate yolov5 on Linux but not on Android? Is this a bug? We've tried various BSP versions of android 11 and 12 but I've never seen any work.

Mawriyo · ‎01-17-2022

Can You create a document on how to do this?