Issues with Using Neutron Delegate on i.MX95

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Issues with Using Neutron Delegate on i.MX95

762 Views
HH2
Contributor I

Hello,
I followed the instructions in
https://www.nxp.com/docs/en/application-note/AN14411.pdf

AN14411
Enabling eIQ Core NPU Delegates for i.MX Android Applications
Rev. 1.0 — 23 September 2024, Application Note

Below are my results:

***************************************************************************************************
1. Execution with CPU (for both target platforms – i.MX 8MP and i.MX 95):

$ adb shell am start -S -n
org.tensorflow.lite.benchmark/.BenchmarkModelActivity --es args '" \
--graph=/data/local/tmp/mobilenet_v1_1.0_224_quant.tflite "'

04-02 09:26:12.009 3302 3302 I tflite_BenchmarkModelActivity: Running TensorFlow Lite benchmark with args: --graph=/data/local/tmp/mobilenet_v1_1.0_224_quant.tflite
04-02 09:26:12.037 3302 3302 I tflite : Log parameter values verbosely: [0]
04-02 09:26:12.038 3302 3302 I tflite : Graph: [/data/local/tmp/mobilenet_v1_1.0_224_quant.tflite]
04-02 09:26:12.039 3302 3302 I tflite : Loaded model /data/local/tmp/mobilenet_v1_1.0_224_quant.tflite
04-02 09:26:12.040 3302 3302 I tflite : Initialized TensorFlow Lite runtime.
04-02 09:26:12.042 3302 3302 I tflite : Created TensorFlow Lite XNNPACK delegate for CPU.
04-02 09:26:12.043 3302 3302 I tflite : Replacing 29 out of 31 node(s) with delegate (TfLiteXNNPackDelegate) node, yielding 4 partitions for the whole graph.
04-02 09:26:12.100 3302 3302 I tflite : The input model file size (MB): 4.27635
04-02 09:26:12.100 3302 3302 I tflite : Initialized session in 61.498ms.
04-02 09:26:12.116 3302 3302 I tflite : Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
04-02 09:26:12.662 3302 3302 I tflite : count=8 first=77168 curr=66236 min=64736 max=77168 avg=68169.6 std=3548
04-02 09:26:12.662 3302 3302 I tflite : Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
04-02 09:26:15.975 3302 3302 I tflite : count=50 first=66354 curr=68568 min=53342 max=70667 avg=66200 std=2821
04-02 09:26:15.975 3302 3302 I tflite : Inference timings in us: Init: 61498, First inference: 77168, Warmup (avg): 68169.6, Inference (avg): 66200
04-02 09:26:15.975 3302 3302 I tflite : Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
04-02 09:26:15.975 3302 3302 I tflite : Memory footprint delta from the start of the tool (MB): init=9.875 overall=16.625
***************************************************************************************************

***************************************************************************************************
3. Execution with Delegates on i.MX 95
• Execution with GPU Delegate:
$ adb shell am start -S -n
org.tensorflow.lite.benchmark/.BenchmarkModelActivity --es args '" \
--graph=/data/local/tmp/mobilenet_v1_1.0_224_quant.tflite \
--use_gpu=true "'

04-02 09:25:04.545 3262 3262 I tflite_BenchmarkModelActivity: Running TensorFlow Lite benchmark with args: --graph=/data/local/tmp/mobilenet_v1_1.0_224_quant.tflite --use_gpu=true
04-02 09:25:04.565 3262 3262 I tflite : Log parameter values verbosely: [0]
04-02 09:25:04.565 3262 3262 I tflite : Graph: [/data/local/tmp/mobilenet_v1_1.0_224_quant.tflite]
04-02 09:25:04.565 3262 3262 I tflite : Use gpu: [1]
04-02 09:25:04.566 3262 3262 I tflite : Loaded model /data/local/tmp/mobilenet_v1_1.0_224_quant.tflite
04-02 09:25:04.567 3262 3262 I tflite : Initialized TensorFlow Lite runtime.
04-02 09:25:04.567 3262 3262 I tflite : Created TensorFlow Lite delegate for GPU.
04-02 09:25:04.567 3262 3262 I tflite : GPU delegate created.
04-02 09:25:04.652 3262 3262 I tflite : Replacing 31 out of 31 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions for the whole graph.
04-02 09:25:06.436 3262 3262 I tflite : Initialized OpenCL-based API.
04-02 09:25:06.681 3262 3262 I tflite : Created 1 GPU delegate kernels.
04-02 09:25:06.682 3262 3262 I tflite : Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
04-02 09:25:06.682 3262 3262 I tflite : The input model file size (MB): 4.27635
04-02 09:25:06.682 3262 3262 I tflite : Initialized session in 2116.47ms.
04-02 09:25:06.700 3262 3262 I tflite : Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
04-02 09:25:07.215 3262 3262 I tflite : count=15 first=51082 curr=26609 min=26199 max=51082 avg=34237.6 std=10532
04-02 09:25:07.215 3262 3262 I tflite : Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
04-02 09:25:08.575 3262 3262 I tflite : count=50 first=26913 curr=28449 min=26067 max=30930 avg=27133.5 std=1070
04-02 09:25:08.575 3262 3262 I tflite : Inference timings in us: Init: 2116469, First inference: 51082, Warmup (avg): 34237.6, Inference (avg): 27133.5
04-02 09:25:08.575 3262 3262 I tflite : Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
04-02 09:25:08.575 3262 3262 I tflite : Memory footprint delta from the start of the tool (MB): init=93.1172 overall=93.1172
***************************************************************************************************

***************************************************************************************************
• Execution with Neutron Delegate:
$ adb shell am start -S -n
org.tensorflow.lite.benchmark/.BenchmarkModelActivity --es args '" \
--graph=/data/local/tmp/mobilenet_v1_1.0_224_quant.tflite \
--external_delegate=libneutron_delegate.so "'

04-02 09:27:17.573 3343 3343 I tflite_BenchmarkModelActivity: Running TensorFlow Lite benchmark with args: --graph=/data/local/tmp/mobilenet_v1_1.0_224_quant.tflite --external_delegate_path=/data/app/~~AM2f1zAPZV4ds7NRDDs1fQ==/org.tensorflow.lite.benchmark-BMmnebX2C-AhxF1RvbK6Ow==/lib/arm64/libneutron_delegate.so
04-02 09:27:17.592 3343 3343 I tflite : Log parameter values verbosely: [0]
04-02 09:27:17.592 3343 3343 I tflite : Graph: [/data/local/tmp/mobilenet_v1_1.0_224_quant.tflite]
04-02 09:27:17.593 3343 3343 I tflite : External delegate path: [/data/app/~~AM2f1zAPZV4ds7NRDDs1fQ==/org.tensorflow.lite.benchmark-BMmnebX2C-AhxF1RvbK6Ow==/lib/arm64/libneutron_delegate.so]
04-02 09:27:17.593 3343 3343 I tflite : Loaded model /data/local/tmp/mobilenet_v1_1.0_224_quant.tflite
04-02 09:27:17.594 3343 3343 I tflite : Initialized TensorFlow Lite runtime.
04-02 09:27:17.650 3343 3343 I tflite : EXTERNAL delegate created.
04-02 09:27:17.679 3343 3343 I tflite : NeutronDelegate delegate: 30 nodes delegated out of 31 nodes with 1 partitions.
04-02 09:27:17.680 3343 3343 I tflite : Replacing 30 out of 31 node(s) with delegate (NeutronDelegate) node, yielding 2 partitions for the whole graph.
***************************************************************************************************

As you can see, I got stuck at the Execution with Neutron Delegate stage:

04-02 09:27:17.680 3343 3343 I tflite : Replacing 30 out of 31 node(s) with delegate (NeutronDelegate) node, yielding 2 partitions for the whole graph.

No further output appears after that.

Currently, the only differences between my setup and the one described in AN14411 are:

1. I am using imx-android-15.0.0_1.0.0.tar.gz, instead of imx-android-14.0.0_2.0.0.tar.gz.

2. During the lunch stage, I use $lunch evk_95-nxp_stable-userdebug instead of $lunch evk_95-trunk_staging-userdebug.

May I ask if the issue is caused by either of these differences? Or what should I adjust in order to make the Neutron Delegate work properly?

Thank you.

0 Kudos
Reply
4 Replies

709 Views
HH2
Contributor I

Hello,

At the beginning, I followed the instructions in
AN14411 – Enabling eIQ Core NPU Delegates for i.MX Android Applications (Rev. 1.0 — 23 September 2024),
and used the following command to build:

$ lunch evk_95-trunk_staging-userdebug
However, when I run:

$ ./imx-make.sh -j16
after a long build process, I encounter the following error:

hardware/nxp/wlan/wifi_hal/common.h:22:10: fatal error: 'wifi_hal.h' file not found

I referred to this post to solve the issue:
https://community.nxp.com/t5/Other-NXP-Products/Build-Error-for-Android-15-0-0-1-0-0-BSP-Source/td-p...

May I ask what would be the best way to adjust or resolve this issue properly?

Thank you.

0 Kudos
Reply

704 Views
JosephAtNXP
NXP TechSupport
NXP TechSupport

Hello,

Replace the use of PLATFORM_VERSION with PLATFORM_SDK_VERSION.

Regards

0 Kudos
Reply

653 Views
HH2
Contributor I

Hello,

I'm not quite sure exactly which part needs to be modified.
Could you please provide a more detailed explanation? Thank you!

0 Kudos
Reply

746 Views
JosephAtNXP
NXP TechSupport
NXP TechSupport

Hi,

Thank you for your interest in NXP Semiconductor products,

The lunch evk_95-trunk_staging-userdebug command selects a target to build android for, at the same time, it sets a release configuration that flags what will be build and what not, I'd start by using the trunk_staging config.

There is a testing stage every android release that should cover the new version usage, so that is not likely.

Regards

0 Kudos
Reply