I downloaded in https://yolox.readthedocs.io/en/latest/demo/onnx_readme.html , the yolox_s.onnx
and convert to tflite using eIQ program.
convert is well working. and I inference in NXP MX 8M PLUS by tflite
CPU delegate is good performance.
but libvx_delegate (NPU) is poor performance.
what is the reason?
I attached tflite converted file (yolox.tfilte).
CPU INFERENCE RESULT : cpu infer.jpg
NPU INFERNECE RESULT : eIQ_converted_tflite.jpg
when I infered NPU, ERROR occur as below.
E [/usr/src/debug/tim-vx/1.1.39-r0/git/src/tim/vx/internal/src/vsi_nn_graph.c:vsi_nn_SetupGraph: 770] CHECK STATUS(-10: the supplied parameter information does not match the kernel contract. )
Hi @woohyoungshin,
Thank you for contacting NXP Support.
Could you please tell me your BSP and eIQ version?
I will try to replicate on my side and verify this issue.
Have a great day!
Thank you for your reply.
I will test and contact you as soon as possible.
Have a great day!
Hi @woohyoungshin,
Sorry for the delayed reply.
I have been working on your case and I found that in our latest BSP release the benchmarks for the iMX8MP shows the following:
For CPU using 4 cores:
For NPU:
With this results we can see a time decrease from 1.427 seconds to 121.617 milliseconds. (~x11 faster than NPU).
In addition to that, after ONNX to TFLite conversion we can see that there are many transpose and Conv2D operators that affects significantly the time for CPU inference. Here is the op profiling:
We can see that CONV_2D and TRANSPOSE take around 1 second.
In contrast, on the NPU those operators are fully supported and accelerate the inference.
Based on the release notes from iMX Machine Learning those operators are bug fixed from your BSP version to the latest BSP version.
Therefore, I would like to suggest upgrading your BSP version to the latest release and testing your model.
I hope this information will be helpful.
Have a great day!