I get a system error using npu on imx9352, I don't know how to go about using npu on imx9352.I would like to ask if there is a history of imx9352 using npu's.
My device tree configuration:
ethosu_mem: ethosu_region@C0000000 {
compatible = "shared-dma-pool";
reg = <0x0 0xC0000000 0x0 0x10000000>;
no-map;
};
ethosu {
compatible = "arm,ethosu";
fsl,cm33-proc = <&cm33>;
memory-region = <ðosu_mem>;
power-domains = <&mlmix>;
};
I use the process as follows:
root@ok-mx93:~# cd /usr/bin/ethosu/examples/
root@ok-mx93:/usr/bin/ethosu/examples# cp ../../tensorflow-lite-2.11.1/examples/labels.txt ./
root@ok-mx93:/usr/bin/ethosu/examples# cp ../../tensorflow-lite-2.11.1/examples/grace_hopper.bmp ./
root@ok-mx93:/usr/bin/ethosu/examples# vela ../../tensorflow-lite-2.11.1/examples/mobilenet_v1_1.0_224_quant.tflite
Network summary for mobilenet_v1_1.0_224_quant
Accelerator configuration Ethos_U65_256
System configuration internal-default
Memory mode internal-default
Accelerator clock 1000 MHz
Design peak SRAM bandwidth 16.00 GB/s
Design peak DRAM bandwidth 3.75 GB/s
Total SRAM used 370.91 KiB
Total DRAM used 3621.95 KiB
CPU operators = 0 (0.0%)
NPU operators = 60 (100.0%)
Average SRAM bandwidth 4.73 GB/s
Input SRAM bandwidth 11.96 MB/batch
Weight SRAM bandwidth 9.70 MB/batch
Output SRAM bandwidth 0.00 MB/batch
Total SRAM bandwidth 21.76 MB/batch
Total SRAM bandwidth per input 21.76 MB/inference (batch size 1)
Average DRAM bandwidth 2.13 GB/s
Input DRAM bandwidth 1.52 MB/batch
Weight DRAM bandwidth 3.23 MB/batch
Output DRAM bandwidth 5.06 MB/batch
Total DRAM bandwidth 9.82 MB/batch
Total DRAM bandwidth per input 9.82 MB/inference (batch size 1)
Neural network macs 572406226 MACs/batch
Network Tops/s 0.25 Tops/s
NPU cycles 3889054 cycles/batch
SRAM Access cycles 1019891 cycles/batch
DRAM Access cycles 1676662 cycles/batch
On-chip Flash Access cycles 0 cycles/batch
Off-chip Flash Access cycles 0 cycles/batch
Total cycles 4602254 cycles/batch
Batch Inference time 4.60 ms, 217.28 inferences/s (batch size 1)
root@ok-mx93:/usr/bin/ethosu/examples# ./inference_runner -n ./output/mobilenet_v1_1.0_224_quant_vela.tflite -i grace_hopper.bmp -l labels.txt -o output.txt
[ 301.631293] remoteproc remoteproc0: powering up imx-rproc
[ 301.638391] remoteproc remoteproc0: Booting fw image ethosu_firmware, size 242424
[ 302.179088] rproc-virtio rproc-virtio.0.auto: assigned reserved memory node vdevbuffer@a4020000
[ 302.188504] virtio_rpmsg_bus virtio0: rpmsg host is online
[ 302.196141] rproc-virtio rproc-virtio.0.auto: registered virtio0 (type 7)
[ 302.203734] rproc-virtio rproc-virtio.1.auto: assigned reserved memory node vdevbuffer@a4020000
[ 302.223392] virtio_rpmsg_bus virtio1: rpmsg host is online
[ 302.225441] virtio_rpmsg_bus virtio1: creating channel rpmsg-ethosu-channel addr 0x1e
[ 302.229006] rproc-virtio rproc-virtio.1.auto: registered virtio1 (type 7)
[ 302.246805] remoteproc remoteproc0: remote processor imx-rproc is now up
Send Ping
Send version request
Send cap[ 302.257522] SError Interrupt on CPU1, code 0x00000000be000011 -- SError
[ 302.257538] CPU: 1 PID: 807 Comm: inference_runne Tainted: G WC 6.1.36 #1
[ 302.257544] Hardware name: Forlinx OK-MX93-C board (DT)
[ 302.257547] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 302.257552] pc : __memset+0x170/0x188
[ 302.257566] lr : dma_alloc_from_dev_coherent+0xc4/0x154
[ 302.257574] sp : ffff80000ad4bc60
[ 302.257576] x29: ffff80000ad4bc60 x28: ffff000005e51d80 x27: 0000000000000000
[ 302.257585] x26: ffff000004a87900 x25: 000000000000000a x24: 0000000000000000
[ 302.257591] x23: ffff000004a87928 x22: ffff00000908cec0 x21: ffff80000ad4bcf0
[ 302.257597] x20: 0000000000333fc0 x19: ffff800010000000 x18: 0000000000000000
[ 302.257603] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffeb1ffee0
[ 302.257608] x14: 0000000000000000 x13: ffff0000043e2008 x12: 0000000000000010
[ 302.257614] x11: 0000000000000400 x10: ffffffffffffffff x9 : 0000000000000000
[ 302.257619] x8 : ffff8000100006c0 x7 : 0000000000000000 x6 : 000000000000003f
[ 302.257624] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
[ 302.257630] x2 : 00000000003338c0 x1 : 0000000000000000 x0 : ffff800010000000
[ 302.257638] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 302.257640] CPU: 1 PID: 807 Comm: inference_runne Tainted: G WC 6.1.36 #1
[ 302.257644] Hardware name: Forlinx OK-MX93-C board (DT)
[ 302.257646] Call trace:
[ 302.257649] dump_backtrace.part.0+0xe0/0xf0
[ 302.257658] show_stack+0x18/0x30
[ 302.257663] dump_stack_lvl+0x64/0x80
[ 302.257669] dump_stack+0x18/0x34
[ 302.257673] panic+0x180/0x338
[ 302.257677] nmi_panic+0xac/0xb0
[ 302.257682] arm64_serror_panic+0x6c/0x7c
[ 302.257686] do_serror+0x0/0x5c
[ 302.257689] do_serror+0x34/0x5c
[ 302.257693] el1h_64_error_handler+0x30/0x4c
[ 302.257698] el1h_64_error+0x64/0x68
[ 302.257702] __memset+0x170/0x188
[ 302.257707] dma_alloc_attrs+0x5c/0xe4
[ 302.257712] ethosu_buffer_create+0x74/0x2a0
[ 302.257719] ethosu_ioctl+0x1d0/0x280
[ 302.257723] __arm64_sys_ioctl+0xac/0xf0
[ 302.257729] invoke_syscall+0x48/0x114
[ 302.257735] el0_svc_common.constprop.0+0xcc/0xec
[ 302.257740] do_el0_svc+0x2c/0xd0
[ 302.257744] el0_svc+0x2c/0x84
[ 302.257749] el0t_64_sync_handler+0xf4/0x120
[ 302.257754] el0t_64_sync+0x18c/0x190
[ 302.257759] SMP: stopping secondary CPUs
[ 302.257770] Kernel Offset: disabled
[ 302.257771] CPU features: 0x30000,000400a4,6600721b
[ 302.257775] Memory Limit: none
解決済! 解決策の投稿を見る。
Your dts node is same as EVK, but EVK has 2GB RAM, i don't know the DDR size on your board.
If your board has 1GB DDR, you can use smaller shared memory pool under NPU.
Hi @Ethane
Can't reproduce this issue on NXP i.MX93 EVK.
root@imx93evk:/usr/bin/ethosu/examples# cp ../../tensorflow-lite-2.11.1/examples/labels.txt ./
root@imx93evk:/usr/bin/ethosu/examples# cp ../../tensorflow-lite-2.11.1/examples/grace_hopper.bmp ./
root@imx93evk:/usr/bin/ethosu/examples# vela ../../tensorflow-lite-2.11.1/examples/mobilenet_v1_1.0_224_quant.tflite
Network summary for mobilenet_v1_1.0_224_quant
Accelerator configuration Ethos_U65_256
System configuration internal-default
Memory mode internal-default
Accelerator clock 1000 MHz
Design peak SRAM bandwidth 16.00 GB/s
Design peak DRAM bandwidth 3.75 GB/s
Total SRAM used 370.91 KiB
Total DRAM used 3621.95 KiB
CPU operators = 0 (0.0%)
NPU operators = 60 (100.0%)
Average SRAM bandwidth 4.73 GB/s
Input SRAM bandwidth 11.96 MB/batch
Weight SRAM bandwidth 9.70 MB/batch
Output SRAM bandwidth 0.00 MB/batch
Total SRAM bandwidth 21.76 MB/batch
Total SRAM bandwidth per input 21.76 MB/inference (batch size 1)
Average DRAM bandwidth 2.13 GB/s
Input DRAM bandwidth 1.52 MB/batch
Weight DRAM bandwidth 3.23 MB/batch
Output DRAM bandwidth 5.06 MB/batch
Total DRAM bandwidth 9.82 MB/batch
Total DRAM bandwidth per input 9.82 MB/inference (batch size 1)
Neural network macs 572406226 MACs/batch
Network Tops/s 0.25 Tops/s
NPU cycles 3889054 cycles/batch
SRAM Access cycles 1019891 cycles/batch
DRAM Access cycles 1676662 cycles/batch
On-chip Flash Access cycles 0 cycles/batch
Off-chip Flash Access cycles 0 cycles/batch
Total cycles 4602254 cycles/batch
Batch Inference time 4.60 ms, 217.28 inferences/s (batch size 1)
root@imx93evk:/usr/bin/ethosu/examples# uname -a
Linux imx93evk 6.1.36+g04b05c5527e9 #1 SMP PREEMPT Mon Sep 4 21:11:15 UTC 2023 aarch64 GNU/Linux
root@imx93evk:/usr/bin/ethosu/examples# ./inference_runner -n ./output/mobilenet_v1_1.0_224_quant_vela.tflite -i grace_hopper.bmp -l labels.txt -o output.txt
[ 85.674752] remoteproc remoteproc0: powering up imx-rproc
[ 85.681704] remoteproc remoteproc0: Booting fw image ethosu_firmware, size 242424
[ 86.198711] rproc-virtio rproc-virtio.3.auto: assigned reserved memory node vdevbuffer@a4020000
[ 86.208987] virtio_rpmsg_bus virtio0: rpmsg host is online
[ 86.214955] rproc-virtio rproc-virtio.3.auto: registered virtio0 (type 7)
[ 86.221865] rproc-virtio rproc-virtio.4.auto: assigned reserved memory node vdevbuffer@a4020000
[ 86.235500] virtio_rpmsg_bus virtio1: rpmsg host is online
[ 86.241084] virtio_rpmsg_bus virtio1: creating channel rpmsg-ethosu-channel addr 0x1e
[ 86.257988] rproc-virtio rproc-virtio.4.auto: registered virtio1 (type 7)
[ 86.264856] remoteproc remoteproc0: remote processor imx-rproc is now up
Send Ping
Send version request
Send capabilities request
Capabilities:
version_status:1
version:{ major=0, minor=0, patch=0 }
product:{ major=6, minor=0, patch=0 }
architecture:{ major=1, minor=0, patch=6 }
driver:{ major=0, minor=16, patch=0 }
macs_per_cc:8
cmd_stream_version:0
custom_dma:false
Create network
Create inference
Wait for inferences
Inference status: running
Wait for inference
Inference status: ok
OFM size: 1001
Detected: military uniform, confidence:70
root@imx93evk:/usr/bin/ethosu/examples#
Your dts node is same as EVK, but EVK has 2GB RAM, i don't know the DDR size on your board.
If your board has 1GB DDR, you can use smaller shared memory pool under NPU.
My board is a 1g ddr, after I set the shared memory pool for the npu to be smaller, it won't get stuck anymore, but it will report the following error, I think it's using evk's firmware, which is incompatible with my own 1g ddr board, what should I do about this?
root@ok-mx93:/usr/bin/ethosu/examples# ./inference_runner -n output/mobilenet_v1_1.0_224_quant_vela.tflite -i grace_hopper.bmp -l labels.txt -o output.txt
[ 58.063151] remoteproc remoteproc0: powering up imx-rproc
[ 58.070435] remoteproc remoteproc0: Booting fw image ethosu_firmware, size 242568
[ 58.080759] remoteproc remoteproc0: Registered carveout doesn't fit len request
[ 58.088171] rproc-virtio: probe of rproc-virtio.0.auto failed with error -12
[ 58.097200] remoteproc remoteproc0: Registered carveout doesn't fit len request
[ 58.105805] rproc-virtio: probe of rproc-virtio.1.auto failed with error -12
[ 58.630656] remoteproc remoteproc0: remote processor imx-rproc is now up
Hi @Ethane
You need download i.MX93 SDK from this page:
https://mcuxpresso.nxp.com/en/welcome
Then modify the vring base address refering your dts in boards/mcimx93evk/demo_apps/ethosu_apps_rpmsg/board.h. Below codes are from 2GB EVK board.
#define VDEV0_VRING_BASE (0xA4000000U)
#define VDEV1_VRING_BASE (0xA4010000U)
Compile new ethosu_firmware.