NXP i.XM8MP EVK:NNAPI run insightface in Android

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

NXP i.XM8MP EVK:NNAPI run insightface in Android

19,627 次查看
Geo
Contributor I

My Environment
Hardware: NXP i.XM8MP EVK A01
Software: Android version 10
Model:insightface_quant Input:type: uint8[1,112,112,3]Output:type: float32[1,512]

I try to use NNAPI load insightface to inference in Android.
When I load the model that npu will do VsiPreparedModel::initialize() three times.
Then when I run predict,  npu will do compute three times.
So total cost time will be same use CPU.
Even I use smaller size model insightface_r32(34.5MB) there will be a issue.

Please refer attach file.

 

0 项奖励
6 回复数

19,538 次查看
xiaofengren
NXP Employee
NXP Employee

The reason that you observed that VsiPreparedModel::initialize() three times is due to your model were splitted to 3 sub-graph, those sub-graph were executed separately by VsiNpu. Would you please refer to following command to enable npu profiling?

 

On Target

Click 10 times on About Tablet option in Settings, to become a developer

Choose Settings -> Developer Options -> OEM Unlocking to enable OEM unlocking.

In Android terminal (UART terminal) enter the following command:

$ reboot bootloader

On Host

device connected via USB-C:

$ sudo fastboot oem unlock

disable the DM-verity

 

$  adb root

$  adb disable-verity

$  adb reboot

 

disable selinux, exec the below cmd in uboot command,

# setenv append_bootargs androidboot.selinux=permissive

or

$ setenforce 0

 

After unlock android, then run following steps to enable profiling service: 

  1. Rename service binary in /vendor/bin/hw/ from android.neural.network***vsi-npu*** to other name.
  2. Kill current server: ps -ef | grep vsi-npu then kill it.
  3. Start the service from /vendor/bin/hw
  4. setprop VSI_NN_LOG_LEVEL 5

  5. Collect log in logcat

 

0 项奖励

19,541 次查看
raluca_popa
NXP Employee
NXP Employee

Hi @Geo ,

How did you obtain the insight face model? Can you share it? Did you use the 'benchmark_model' eIQ TFLite app or custom code?

Thanks,

Raluca

0 项奖励

19,294 次查看
Geo
Contributor I

update my current state of issue

benchmark download from https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/...

Attached are reports on whether NNAPI has been used or not.

insightface_r100_quant_4_1_50_profiling.txt ===> ./android_aarch64_benchmark_model_plus_flex --num_threads=4 --graph=insightface_r100_quant.tflite --warmup_runs=1 --num_runs=50 --enable_op_profiling=true > insightface_r100_quant_4_1_50_profiling.txt
insightface_r100_quant_4_1_50_nnapi_profiling.txt ===> ./android_aarch64_benchmark_model_plus_flex --num_threads=4 --graph=insightface_r100_quant.tflite --warmup_runs=1 --num_runs=50 --use_nnapi=true --enable_op_profiling=true > insightface_r100_quant_4_1_50_nnapi_profiling.txt

The inference time of using NNAPI is 491ms, and the inference time of not using NNAPI is 988ms
Is this reasonable? I originally thought that using NPU can be within 400ms.

Another problem is that even though the inference time of benchmark is 491ms, the inference time of Tensorflow Lite on Android is nearly 1000ms warmup time is 4950ms
PLease refer attach file Android_TensorFlow_Lite_debug.nn.vlog==1.txt
Is this reasonable? I thought the inference time using NNAPI on Tensorflow Lite should be about 491ms as the benchmark.

0 项奖励

19,294 次查看
Geo
Contributor I

Update my current state of issue

benchmark download from https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/...

I use benchmark to run model in NXP i.XM8MP EVK A01.
Attached are reports on whether NNAPI has been used or not.
insightface_r100_quant_4_1_50_nnapi_profiling.txt ==> ./android_aarch64_benchmark_model_plus_flex --num_threads=4 --graph=insightface_r100_quant.tflite --warmup_runs=1 --num_runs=50 --use_nnapi=true --enable_op_profiling=true > insightface_r100_quant_4_1_50_nnapi_profiling.txt
insightface_r100_quant_4_1_50_profiling.txt ===> ./android_aarch64_benchmark_model_plus_flex --num_threads=4 --graph=insightface_r100_quant.tflite --warmup_runs=1 --num_runs=50 --enable_op_profiling=true > insightface_r100_quant_4_1_50_profiling.txt

The inference time with NNAPI(491ms) is faster than without NNAPI(988ms).
Is this reasonable? I originally thought that using NPU can be within 400ms.

Another question is that even if the result of using the benchmark is 491ms, but using TensorFlow Lite in Android, the total cost is still close to 1000 ms, and the warmup time is 4950 ms.
Please refer attach file Android_TensorFlow_Lite_debug.nn.vlog==1.txt
Is this reasonable? I thought that inference time should be 490ms in TensorFlow lite.

0 项奖励

19,472 次查看
Geo
Contributor I

My environment 
Python 3.7.0
tensorflow 2.4.0

The model is from https://github.com/deepinsight/insightface, and use mmdnn converted to pd format and then use tensorflow converted to tflite.
I uploaded insightface_r100_quant.tflite to wetransfer link:https://we.tl/t-Mdz4PKLYJv

insightface_r100_quant.tflite
input:name: data type: uint8[1,112,112,3]
Output:name: output type: float32[1,512]

Attach file is use insightface_r100_quant.tflite run benchmark on NXP i.XM8MP EVK

0 项奖励

19,273 次查看
PhamHoangBao
Contributor I

Dear @Geo ,

Could you please upload again your TFLite file in wetransfer link because this is expired now?

And could you  mind sharing your python code to convert InsightFace model to TFLite format with input is uint8 and output is float32 ? I have no idea on how to convert to TFLite model which input and output have different data type.

Thank you so much!

Bao

0 项奖励