Tensorflow Lite demo on iMX8MQuad

gsavaton · ‎05-10-2022

This question has already been asked on this forum in 2020 and in 2021 concerning previous versions of the OS and demo software but the linked conversations do not propose a solution.

I am following the i.MX Machine Learning User's Guide and trying to run the Tensorflow Lite "label_image" example using the GPU.

My setup is :

An i.MX 8M EVK Board
The official Yocto image "imx-image-full-imx8mqevk.wic", version 5.15.5-1.0.0

Running label_image on the CPU gives the following result:

root@imx8mqevk:/usr/bin/tensorflow-lite-2.6.0/examples# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 49.08 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit

Attempting to use the GPU gives a longer inference time. I'm assuming that it fails and falls back to the CPU.

Using the option "--use_nnapi":

root@imx8mqevk:/usr/bin/tensorflow-lite-2.6.0/examples# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --use_nnapi=true 
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: Created TensorFlow Lite delegate for NNAPI.
NNAPI delegate created.
INFO: Applied NNAPI delegate.
W [query_hardware_caps:71]Unsupported evis version
INFO: invoked
INFO: average time: 102.578 ms
INFO: 0.784314: 653 military uniform
INFO: 0.105882: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.00784314: 466 bulletproof vest
INFO: 0.00392157: 835 suit

Using the "libvx" delegate:

root@imx8mqevk:/usr/bin/tensorflow-lite-2.6.0/examples# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libvx_delegate.so
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
EXTERNAL delegate created.
INFO: Applied EXTERNAL delegate.
W [query_hardware_caps:71]Unsupported evis version
W [HandleLayoutInfer:266]Op 18: default layout inference pass.
INFO: invoked
INFO: average time: 102.432 ms
INFO: 0.784314: 653 military uniform
INFO: 0.105882: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.00784314: 466 bulletproof vest
INFO: 0.00392157: 835 suit

In the other threads, I have found no explanation concerning the message:

W [query_hardware_caps:71]Unsupported evis version

Has anybody found a solution to this problem? Or can anybody suggest a procedure to debug the issue?

Additional precisions:

Graphical GPU demos available from the desktop environment work correctly.
I have also tried to compile the example label_image manually and got the same results.

joanxie · ‎05-18-2022

got reply from expert team:

"for Unsupported evis version, This is possibly because of driver problem. 8m quad doesn't support evis. The reason why this error doesn't occur in user guide is also because 8m plus is used in guide example. ."

gsavaton · ‎05-18-2022

Thanks for your answer.

Yes, the guide explicitly states that the examples run on the i.MX 8M Plus. However, figure 1 in chapter 1 suggests that Tensorflow Lite should be able to use the GPU of the iMX 8M Quad as well.

Can you please clarify what "evis" is, how it relates to the OpenVX driver and what options are available to work around the problem? I have tried to recompile the label_image example myself but got the same result.

joanxie · ‎05-20-2022

Vivante OpenVX is not based on OpenCL. Most other GPU vendors implement OpenVX on top of OpenCL but Vivante implementation is different. Vivante OpenCL does not use EVIS intrinsics. While Vivante OpenVX implementation uses intrinsics.

gsavaton · ‎05-23-2022

Thanks for the explanation. However, you did not answer this specific question:

what options are available to work around the problem?

By "the problem", I mean the longer execution time when running the tensorflow lite demos with NNAPI or the libvx delegate (see my first post in this thread).

As you seem to confirm that this chip and the software stack support GPU-based acceleration for ML, what can explain this behavior?

joanxie · ‎05-24-2022

this explanation tell you why imx8mp cost longer, this isn't software can fix it, this is HW limitation

"the inference time of 8m plus board using CPU and NPU. But 8m quad board doesn't contain a NPU and on user guide there is no inference result about GPU using.

Inference time using CPU here is similar on 8m plus and 8m quad board here. And when using GPU, it is possible that the average running time is longer as there are four CPUs on board and also GPU we use on 8m quad is not that strong."

gsavaton · ‎06-01-2022

Thanks for the explanation. I will make a few experiments to evaluate the benefits of the GPU for ML on this platform.

I also think that it would be useful if the documentation included a paragraph on this topic.

Since the documentation claims that the iMX8MQuad supports GPU-based hardware acceleration, it would be great if NXP could explain in which situations it can be relevant.

Tensorflow Lite demo on iMX8MQuad

Tensorflow Lite demo on iMX8MQuad

i.MX 8M | i.MX 8M Mini | i.MX 8M Nano