Image : LF_v5.15.5-1.0.0_images_IMX8MQEVK
Hi @Bio_TICFSL, It seems that the GPU is underperforming while I am running the prebuilt model file, when I run this same thing on the CPU, it gives a much faster result. Below is the mentioned average time for CPU and GPU.
For CPU ==>
thread = 1:
./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 1
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 179.697 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit
thread = 2:
./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 2
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 92.645 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit
thread = 3 :
./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 3
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 64.785 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit
thread = 4 :
./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 4
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 48.975 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit
For GPU ==>
./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -a 1
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: Created TensorFlow Lite delegate for NNAPI.
NNAPI delegate created.
INFO: Applied NNAPI delegate.
W [query_hardware_caps:71]Unsupported evis version
INFO: invoked
INFO: average time: 103.217 ms
INFO: 0.784314: 653 military uniform
INFO: 0.105882: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.00784314: 466 bulletproof vest
INFO: 0.00392157: 835 suit
For any number of threads > 1, GPU is slower than CPU. Is there a way to accelerate GPU inference time so that it is faster than the CPU?