i.MX 8M PLUS yolox_s.onnx convert to tflite using eIQ tool but but performance is not good.

woohyoungshin · ‎10-24-2023

I downloaded in https://yolox.readthedocs.io/en/latest/demo/onnx_readme.html , the yolox_s.onnx

and convert to tflite using eIQ program.

convert is well working. and I inference in NXP MX 8M PLUS by tflite

CPU delegate is good performance.

but libvx_delegate (NPU) is poor performance.

what is the reason?

I attached tflite converted file (yolox.tfilte).

CPU INFERENCE RESULT : cpu infer.jpg

NPU INFERNECE RESULT : eIQ_converted_tflite.jpg

when I infered NPU, ERROR occur as below.

E [/usr/src/debug/tim-vx/1.1.39-r0/git/src/tim/vx/internal/src/vsi_nn_graph.c:vsi_nn_SetupGraph: 770] CHECK STATUS(-10: the supplied parameter information does not match the kernel contract. )

brian14 · ‎10-25-2023

Hi @woohyoungshin,

Thank you for contacting NXP Support.

Could you please tell me your BSP and eIQ version?

I will try to replicate on my side and verify this issue.

Have a great day!

woohyoungshin · ‎11-01-2023

here is information.

BSP : Yocto kirkstone-5.15.32

eIQ version : 2.9.9

brian14 · ‎11-07-2023

Thank you for your reply.

I will test and contact you as soon as possible.

Have a great day!

woohyoungshin · ‎11-12-2023

How is the work going? Is it going well?

brian14 · ‎11-16-2023

Hi @woohyoungshin,

Sorry for the delayed reply.

I have been working on your case and I found that in our latest BSP release the benchmarks for the iMX8MP shows the following:

For CPU using 4 cores:

For NPU:

With this results we can see a time decrease from 1.427 seconds to 121.617 milliseconds. (~x11 faster than NPU).

In addition to that, after ONNX to TFLite conversion we can see that there are many transpose and Conv2D operators that affects significantly the time for CPU inference. Here is the op profiling:

We can see that CONV_2D and TRANSPOSE take around 1 second.

In contrast, on the NPU those operators are fully supported and accelerate the inference.

Based on the release notes from iMX Machine Learning those operators are bug fixed from your BSP version to the latest BSP version.

Therefore, I would like to suggest upgrading your BSP version to the latest release and testing your model.

I hope this information will be helpful.

Have a great day!

woohyoungshin · ‎11-20-2023

From my understanding, while the execution speed was fast, there were issues with the results of the execution. I’m curious if the results come out well when performing inference on images with the above results. Could you clarify this?

woohyoungshin · ‎11-21-2023

As seen in the attached picture (eIQ_converted_tflite.jpg), there was an issue with accuracy

woohyoungshin · ‎11-20-2023

Could I know the BSP version at which the bug was fixed?