My Environment
Hardware: NXP i.XM8MP EVK A01
Software: Android version 10
Model:insightface_quant Input:type: uint8[1,112,112,3]Output:type: float32[1,512]
I try to use NNAPI load insightface to inference in Android.
When I load the model that npu will do VsiPreparedModel::initialize() three times.
Then when I run predict, npu will do compute three times.
So total cost time will be same use CPU.
Even I use smaller size model insightface_r32(34.5MB) there will be a issue.
Please refer attach file.
The reason that you observed that VsiPreparedModel::initialize() three times is due to your model were splitted to 3 sub-graph, those sub-graph were executed separately by VsiNpu. Would you please refer to following command to enable npu profiling?
On Target
Click 10 times on About Tablet option in Settings, to become a developer
Choose Settings -> Developer Options -> OEM Unlocking to enable OEM unlocking.
In Android terminal (UART terminal) enter the following command:
$ reboot bootloader
On Host
device connected via USB-C:
$ sudo fastboot oem unlock
disable the DM-verity
$ adb root
$ adb disable-verity
$ adb reboot
disable selinux, exec the below cmd in uboot command,
# setenv append_bootargs androidboot.selinux=permissive
or
$ setenforce 0
After unlock android, then run following steps to enable profiling service:
setprop VSI_NN_LOG_LEVEL 5
Hi @Geo ,
How did you obtain the insight face model? Can you share it? Did you use the 'benchmark_model' eIQ TFLite app or custom code?
Thanks,
Raluca
update my current state of issue
benchmark download from https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/...
Attached are reports on whether NNAPI has been used or not.
insightface_r100_quant_4_1_50_profiling.txt ===> ./android_aarch64_benchmark_model_plus_flex --num_threads=4 --graph=insightface_r100_quant.tflite --warmup_runs=1 --num_runs=50 --enable_op_profiling=true > insightface_r100_quant_4_1_50_profiling.txt
insightface_r100_quant_4_1_50_nnapi_profiling.txt ===> ./android_aarch64_benchmark_model_plus_flex --num_threads=4 --graph=insightface_r100_quant.tflite --warmup_runs=1 --num_runs=50 --use_nnapi=true --enable_op_profiling=true > insightface_r100_quant_4_1_50_nnapi_profiling.txt
The inference time of using NNAPI is 491ms, and the inference time of not using NNAPI is 988ms
Is this reasonable? I originally thought that using NPU can be within 400ms.
Another problem is that even though the inference time of benchmark is 491ms, the inference time of Tensorflow Lite on Android is nearly 1000ms warmup time is 4950ms
PLease refer attach file Android_TensorFlow_Lite_debug.nn.vlog==1.txt
Is this reasonable? I thought the inference time using NNAPI on Tensorflow Lite should be about 491ms as the benchmark.
Update my current state of issue
benchmark download from https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/...
I use benchmark to run model in NXP i.XM8MP EVK A01.
Attached are reports on whether NNAPI has been used or not.
insightface_r100_quant_4_1_50_nnapi_profiling.txt ==> ./android_aarch64_benchmark_model_plus_flex --num_threads=4 --graph=insightface_r100_quant.tflite --warmup_runs=1 --num_runs=50 --use_nnapi=true --enable_op_profiling=true > insightface_r100_quant_4_1_50_nnapi_profiling.txt
insightface_r100_quant_4_1_50_profiling.txt ===> ./android_aarch64_benchmark_model_plus_flex --num_threads=4 --graph=insightface_r100_quant.tflite --warmup_runs=1 --num_runs=50 --enable_op_profiling=true > insightface_r100_quant_4_1_50_profiling.txt
The inference time with NNAPI(491ms) is faster than without NNAPI(988ms).
Is this reasonable? I originally thought that using NPU can be within 400ms.
Another question is that even if the result of using the benchmark is 491ms, but using TensorFlow Lite in Android, the total cost is still close to 1000 ms, and the warmup time is 4950 ms.
Please refer attach file Android_TensorFlow_Lite_debug.nn.vlog==1.txt
Is this reasonable? I thought that inference time should be 490ms in TensorFlow lite.
My environment
Python 3.7.0
tensorflow 2.4.0
The model is from https://github.com/deepinsight/insightface, and use mmdnn converted to pd format and then use tensorflow converted to tflite.
I uploaded insightface_r100_quant.tflite to wetransfer link:https://we.tl/t-Mdz4PKLYJv
insightface_r100_quant.tflite
input:name: data type: uint8[1,112,112,3]
Output:name: output type: float32[1,512]
Attach file is use insightface_r100_quant.tflite run benchmark on NXP i.XM8MP EVK
Dear @Geo ,
Could you please upload again your TFLite file in wetransfer link because this is expired now?
And could you mind sharing your python code to convert InsightFace model to TFLite format with input is uint8 and output is float32 ? I have no idea on how to convert to TFLite model which input and output have different data type.
Thank you so much!
Bao