imx9352 npu not working

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

imx9352 npu not working

Jump to solution
1,113 Views
Ethane
Contributor II

I get a system error using npu on imx9352, I don't know how to go about using npu on imx9352.I would like to ask if there is a history of imx9352 using npu's.

My device tree configuration:

ethosu_mem: ethosu_region@C0000000 {
  compatible = "shared-dma-pool";
  reg = <0x0 0xC0000000 0x0 0x10000000>;
  no-map;
};

ethosu {
  compatible = "arm,ethosu";
  fsl,cm33-proc = <&cm33>;
  memory-region = <&ethosu_mem>;
  power-domains = <&mlmix>;
};

I use the process as follows:

root@ok-mx93:~# cd /usr/bin/ethosu/examples/
root@ok-mx93:/usr/bin/ethosu/examples# cp ../../tensorflow-lite-2.11.1/examples/labels.txt ./
root@ok-mx93:/usr/bin/ethosu/examples# cp ../../tensorflow-lite-2.11.1/examples/grace_hopper.bmp ./
root@ok-mx93:/usr/bin/ethosu/examples# vela ../../tensorflow-lite-2.11.1/examples/mobilenet_v1_1.0_224_quant.tflite

Network summary for mobilenet_v1_1.0_224_quant
Accelerator configuration Ethos_U65_256
System configuration internal-default
Memory mode internal-default
Accelerator clock 1000 MHz
Design peak SRAM bandwidth 16.00 GB/s
Design peak DRAM bandwidth 3.75 GB/s

Total SRAM used 370.91 KiB
Total DRAM used 3621.95 KiB

CPU operators = 0 (0.0%)
NPU operators = 60 (100.0%)

Average SRAM bandwidth 4.73 GB/s
Input SRAM bandwidth 11.96 MB/batch
Weight SRAM bandwidth 9.70 MB/batch
Output SRAM bandwidth 0.00 MB/batch
Total SRAM bandwidth 21.76 MB/batch
Total SRAM bandwidth per input 21.76 MB/inference (batch size 1)

Average DRAM bandwidth 2.13 GB/s
Input DRAM bandwidth 1.52 MB/batch
Weight DRAM bandwidth 3.23 MB/batch
Output DRAM bandwidth 5.06 MB/batch
Total DRAM bandwidth 9.82 MB/batch
Total DRAM bandwidth per input 9.82 MB/inference (batch size 1)

Neural network macs 572406226 MACs/batch
Network Tops/s 0.25 Tops/s

NPU cycles 3889054 cycles/batch
SRAM Access cycles 1019891 cycles/batch
DRAM Access cycles 1676662 cycles/batch
On-chip Flash Access cycles 0 cycles/batch
Off-chip Flash Access cycles 0 cycles/batch
Total cycles 4602254 cycles/batch

Batch Inference time 4.60 ms, 217.28 inferences/s (batch size 1)
root@ok-mx93:/usr/bin/ethosu/examples# ./inference_runner -n ./output/mobilenet_v1_1.0_224_quant_vela.tflite -i grace_hopper.bmp -l labels.txt -o output.txt
[ 301.631293] remoteproc remoteproc0: powering up imx-rproc
[ 301.638391] remoteproc remoteproc0: Booting fw image ethosu_firmware, size 242424
[ 302.179088] rproc-virtio rproc-virtio.0.auto: assigned reserved memory node vdevbuffer@a4020000
[ 302.188504] virtio_rpmsg_bus virtio0: rpmsg host is online
[ 302.196141] rproc-virtio rproc-virtio.0.auto: registered virtio0 (type 7)
[ 302.203734] rproc-virtio rproc-virtio.1.auto: assigned reserved memory node vdevbuffer@a4020000
[ 302.223392] virtio_rpmsg_bus virtio1: rpmsg host is online
[ 302.225441] virtio_rpmsg_bus virtio1: creating channel rpmsg-ethosu-channel addr 0x1e
[ 302.229006] rproc-virtio rproc-virtio.1.auto: registered virtio1 (type 7)
[ 302.246805] remoteproc remoteproc0: remote processor imx-rproc is now up
Send Ping
Send version request
Send cap[ 302.257522] SError Interrupt on CPU1, code 0x00000000be000011 -- SError
[ 302.257538] CPU: 1 PID: 807 Comm: inference_runne Tainted: G WC 6.1.36 #1
[ 302.257544] Hardware name: Forlinx OK-MX93-C board (DT)
[ 302.257547] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 302.257552] pc : __memset+0x170/0x188
[ 302.257566] lr : dma_alloc_from_dev_coherent+0xc4/0x154
[ 302.257574] sp : ffff80000ad4bc60
[ 302.257576] x29: ffff80000ad4bc60 x28: ffff000005e51d80 x27: 0000000000000000
[ 302.257585] x26: ffff000004a87900 x25: 000000000000000a x24: 0000000000000000
[ 302.257591] x23: ffff000004a87928 x22: ffff00000908cec0 x21: ffff80000ad4bcf0
[ 302.257597] x20: 0000000000333fc0 x19: ffff800010000000 x18: 0000000000000000
[ 302.257603] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffeb1ffee0
[ 302.257608] x14: 0000000000000000 x13: ffff0000043e2008 x12: 0000000000000010
[ 302.257614] x11: 0000000000000400 x10: ffffffffffffffff x9 : 0000000000000000
[ 302.257619] x8 : ffff8000100006c0 x7 : 0000000000000000 x6 : 000000000000003f
[ 302.257624] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[ 302.257630] x2 : 00000000003338c0 x1 : 0000000000000000 x0 : ffff800010000000
[ 302.257638] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 302.257640] CPU: 1 PID: 807 Comm: inference_runne Tainted: G WC 6.1.36 #1
[ 302.257644] Hardware name: Forlinx OK-MX93-C board (DT)
[ 302.257646] Call trace:
[ 302.257649] dump_backtrace.part.0+0xe0/0xf0
[ 302.257658] show_stack+0x18/0x30
[ 302.257663] dump_stack_lvl+0x64/0x80
[ 302.257669] dump_stack+0x18/0x34
[ 302.257673] panic+0x180/0x338
[ 302.257677] nmi_panic+0xac/0xb0
[ 302.257682] arm64_serror_panic+0x6c/0x7c
[ 302.257686] do_serror+0x0/0x5c
[ 302.257689] do_serror+0x34/0x5c
[ 302.257693] el1h_64_error_handler+0x30/0x4c
[ 302.257698] el1h_64_error+0x64/0x68
[ 302.257702] __memset+0x170/0x188
[ 302.257707] dma_alloc_attrs+0x5c/0xe4
[ 302.257712] ethosu_buffer_create+0x74/0x2a0
[ 302.257719] ethosu_ioctl+0x1d0/0x280
[ 302.257723] __arm64_sys_ioctl+0xac/0xf0
[ 302.257729] invoke_syscall+0x48/0x114
[ 302.257735] el0_svc_common.constprop.0+0xcc/0xec
[ 302.257740] do_el0_svc+0x2c/0xd0
[ 302.257744] el0_svc+0x2c/0x84
[ 302.257749] el0t_64_sync_handler+0xf4/0x120
[ 302.257754] el0t_64_sync+0x18c/0x190
[ 302.257759] SMP: stopping secondary CPUs
[ 302.257770] Kernel Offset: disabled
[ 302.257771] CPU features: 0x30000,000400a4,6600721b
[ 302.257775] Memory Limit: none

 

0 Kudos
Reply
1 Solution
1,076 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Your dts node is same as EVK, but EVK has 2GB RAM, i don't know the DDR size on your board.

If your board has 1GB DDR, you can use smaller shared memory pool under NPU.

View solution in original post

0 Kudos
Reply
6 Replies
1,086 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hi @Ethane 

Can't reproduce this issue on NXP i.MX93 EVK.

root@imx93evk:/usr/bin/ethosu/examples# cp ../../tensorflow-lite-2.11.1/examples/labels.txt ./
root@imx93evk:/usr/bin/ethosu/examples#  cp ../../tensorflow-lite-2.11.1/examples/grace_hopper.bmp ./
root@imx93evk:/usr/bin/ethosu/examples#  vela ../../tensorflow-lite-2.11.1/examples/mobilenet_v1_1.0_224_quant.tflite

Network summary for mobilenet_v1_1.0_224_quant
Accelerator configuration               Ethos_U65_256
System configuration                 internal-default
Memory mode                          internal-default
Accelerator clock                                1000 MHz
Design peak SRAM bandwidth                      16.00 GB/s
Design peak DRAM bandwidth                       3.75 GB/s

Total SRAM used                                370.91 KiB
Total DRAM used                               3621.95 KiB

CPU operators = 0 (0.0%)
NPU operators = 60 (100.0%)

Average SRAM bandwidth                           4.73 GB/s
Input   SRAM bandwidth                          11.96 MB/batch
Weight  SRAM bandwidth                           9.70 MB/batch
Output  SRAM bandwidth                           0.00 MB/batch
Total   SRAM bandwidth                          21.76 MB/batch
Total   SRAM bandwidth            per input     21.76 MB/inference (batch size 1)

Average DRAM bandwidth                           2.13 GB/s
Input   DRAM bandwidth                           1.52 MB/batch
Weight  DRAM bandwidth                           3.23 MB/batch
Output  DRAM bandwidth                           5.06 MB/batch
Total   DRAM bandwidth                           9.82 MB/batch
Total   DRAM bandwidth            per input      9.82 MB/inference (batch size 1)

Neural network macs                         572406226 MACs/batch
Network Tops/s                                   0.25 Tops/s

NPU cycles                                    3889054 cycles/batch
SRAM Access cycles                            1019891 cycles/batch
DRAM Access cycles                            1676662 cycles/batch
On-chip Flash Access cycles                         0 cycles/batch
Off-chip Flash Access cycles                        0 cycles/batch
Total cycles                                  4602254 cycles/batch

Batch Inference time                 4.60 ms,  217.28 inferences/s (batch size 1)

root@imx93evk:/usr/bin/ethosu/examples# uname -a
Linux imx93evk 6.1.36+g04b05c5527e9 #1 SMP PREEMPT Mon Sep  4 21:11:15 UTC 2023 aarch64 GNU/Linux
root@imx93evk:/usr/bin/ethosu/examples# ./inference_runner -n ./output/mobilenet_v1_1.0_224_quant_vela.tflite -i grace_hopper.bmp -l labels.txt -o output.txt
[   85.674752] remoteproc remoteproc0: powering up imx-rproc
[   85.681704] remoteproc remoteproc0: Booting fw image ethosu_firmware, size 242424
[   86.198711] rproc-virtio rproc-virtio.3.auto: assigned reserved memory node vdevbuffer@a4020000
[   86.208987] virtio_rpmsg_bus virtio0: rpmsg host is online
[   86.214955] rproc-virtio rproc-virtio.3.auto: registered virtio0 (type 7)
[   86.221865] rproc-virtio rproc-virtio.4.auto: assigned reserved memory node vdevbuffer@a4020000
[   86.235500] virtio_rpmsg_bus virtio1: rpmsg host is online
[   86.241084] virtio_rpmsg_bus virtio1: creating channel rpmsg-ethosu-channel addr 0x1e
[   86.257988] rproc-virtio rproc-virtio.4.auto: registered virtio1 (type 7)
[   86.264856] remoteproc remoteproc0: remote processor imx-rproc is now up
Send Ping
Send version request
Send capabilities request
Capabilities:
        version_status:1
        version:{ major=0, minor=0, patch=0 }
        product:{ major=6, minor=0, patch=0 }
        architecture:{ major=1, minor=0, patch=6 }
        driver:{ major=0, minor=16, patch=0 }
        macs_per_cc:8
        cmd_stream_version:0
        custom_dma:false
Create network
Create inference
Wait for inferences
Inference status: running
Wait for inference
Inference status: ok
OFM size: 1001

Detected: military uniform, confidence:70
root@imx93evk:/usr/bin/ethosu/examples#

 

0 Kudos
Reply
1,081 Views
Ethane
Contributor II
I would like to ask if there is something wrong with my device tree configuration?
0 Kudos
Reply
1,077 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Your dts node is same as EVK, but EVK has 2GB RAM, i don't know the DDR size on your board.

If your board has 1GB DDR, you can use smaller shared memory pool under NPU.

0 Kudos
Reply
954 Views
Ethane
Contributor II

My board is a 1g ddr, after I set the shared memory pool for the npu to be smaller, it won't get stuck anymore, but it will report the following error, I think it's using evk's firmware, which is incompatible with my own 1g ddr board, what should I do about this?

 

root@ok-mx93:/usr/bin/ethosu/examples# ./inference_runner -n output/mobilenet_v1_1.0_224_quant_vela.tflite -i grace_hopper.bmp -l labels.txt -o output.txt
[ 58.063151] remoteproc remoteproc0: powering up imx-rproc
[ 58.070435] remoteproc remoteproc0: Booting fw image ethosu_firmware, size 242568
[ 58.080759] remoteproc remoteproc0: Registered carveout doesn't fit len request
[ 58.088171] rproc-virtio: probe of rproc-virtio.0.auto failed with error -12
[ 58.097200] remoteproc remoteproc0: Registered carveout doesn't fit len request
[ 58.105805] rproc-virtio: probe of rproc-virtio.1.auto failed with error -12
[ 58.630656] remoteproc remoteproc0: remote processor imx-rproc is now up

0 Kudos
Reply
947 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hi @Ethane 

You need download i.MX93 SDK from this page:

https://mcuxpresso.nxp.com/en/welcome

Then modify the vring base address refering your dts in boards/mcimx93evk/demo_apps/ethosu_apps_rpmsg/board.h. Below codes are from 2GB EVK board.

#define VDEV0_VRING_BASE (0xA4000000U)
#define VDEV1_VRING_BASE (0xA4010000U)

Compile new ethosu_firmware.

0 Kudos
Reply
931 Views
Ethane
Contributor II
Problem solved, thank you for your answer!
0 Kudos
Reply