Dear All
Right now we are trying to enable the GPU accelerate the tflite computing on the I.MX8M nano board , but the performance is not as expectation, I summarized the tflite performance on i.MX8 nano board.
Test result based on running the label_image sample code.
For the FW built by Arrow (L5.4.3-1.0.0, tflite ver = 1.13.2), the sample program gave the same result whenever GPU acceleration is enabled (~80ms). As you mentioned the tflite didn't compiled with GPU acceleration, this result should be the expected.
For the latest stock FW from NXP (L5.4.47-2.2.0, tflite ver = 2.2.0), the performance was worsened if NNAPI (GPU acceleration) is enabled. (48ms vs 400ms)
In summary, with CPU mode, tflite ran faster on ver2.2 than on ver1.13. The GPU gave negative performance gain.
Attached text file is the detailed log. Can you help to give some comments, thank
I just find some information about the accelerate of tensor flow lite model , maybe you can test it the second time.
The first iteration of model inference using the NN API always takes many times longer,
because of model graph initialization needed by the GPU module. The iterations
following the graph initialization will be performed many times faster.

But I have already excluded this factor.
In fact, this initialization takes around 4 seconds.
Please refer to the case 3 in my log.
That test case uses python sample code which shows the warm-up time and inference time separately.