RPMsg-lite produces kernel panic

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

RPMsg-lite produces kernel panic

4,895 Views
ycx
Contributor I

Hello,

I use IMX8MN to make signal acquisition and processing on M7 side (baremetal) and then send some results to Linux on A53 through rpmsg_tty. One tty channel is used to transfer some protobuf data of different size and the other tty is used to transfer buffers of constant size. This represents a fairly big bandwidth (~3Mbps) and I had to increase RPMSG_BUF_SIZE on both side (default 512 -> 4096) : 

M7 rpmsg_config.h : 

#define RL_BUFFER_PAYLOAD_SIZE (4080U)
#define RL_BUFFER_COUNT (8U) // buf count for each vring
 

Linux imx_rpmsg.c : 

#define RPMSG_NUM_BUFS (16) // apparently buf num of BOTH vrings (2* 8U)
#define RPMSG_BUF_SIZE (4096)
 
Linux virtio_rpmsg_bus.c : 
 
#define MAX_RPMSG_NUM_BUFS (128) // unused - recomputed from total vrings size (dts)
#define MAX_RPMSG_BUF_SIZE (4096)
 
 
I read the data received in Linux in a simple process that opens tty and print basic rx protobuf IDs and size. It runs during some times but whenever I kill it, it triggers a oops kernel panic : 
 

[ 76.348614] remoteproc remoteproc0: powering up imx-rproc
[ 76.356616] remoteproc remoteproc0: Booting fw image proto_fw_m7.elf, size 620760
[ 76.415869] virtio_rpmsg_bus virtio0: no of_node; not parsing pinctrl DT
[ 76.422913] cma: cma_alloc(cma 00000000ffcb3fc8, count 16, align 4)
[ 76.431651] cma: cma_alloc(): returned 00000000c00d73ee
[ 76.437602] rpmsg_ns virtio0.rpmsg_ns.53.53: no of_node; not parsing pinctrl DT
[ 76.445412] virtio_rpmsg_bus virtio0: rpmsg host is online
[ 76.445480] virtio_rpmsg_bus virtio0: creating channel rpmsg-virtual-tty-channel-1 addr 0x1
[ 76.451014] remoteproc0#vdev0buffer: registered virtio0 (type 7)
[ 76.460458] imx_rpmsg_tty virtio0.rpmsg-virtual-tty-channel-1.-1.1: no of_node; not parsing pinctrl DT
[ 76.465431] remoteproc remoteproc0: remote processor imx-rproc is now up
[ 76.474888] imx_rpmsg_tty virtio0.rpmsg-virtual-tty-channel-1.-1.1: new channel: 0x400 -> 0x1!
[ 76.490769] Install rpmsg tty driver!
[ 76.494733] virtio_rpmsg_bus virtio0: creating channel rpmsg-virtual-tty-channel addr 0x2
[ 76.503236] imx_rpmsg_tty virtio0.rpmsg-virtual-tty-channel.-1.2: no of_node; not parsing pinctrl DT
[ 76.512593] imx_rpmsg_tty virtio0.rpmsg-virtual-tty-channel.-1.2: new channel: 0x401 -> 0x2!
[ 76.521396] Install rpmsg tty driver!

... running 


[ 371.960296] imx_rpmsg_tty virtio0.rpmsg-virtual-tty-channel-1.-1.1: rpmsg tty driver is removed
[ 371.969944] imx_rpmsg_tty virtio0.rpmsg-virtual-tty-channel.-1.2: rpmsg tty driver is removed
[ 371.979307] cma: cma_release(page 00000000c00d73ee)
[ 371.984998] Unable to handle kernel paging request at virtual address ffff8000117d300c
[ 371.997803] Mem abort info:
[ 372.000130] imx-rproc imx8mn-cm7: remotecore not run into wfi, force stop: 0 -60 0
[ 372.000617] ESR = 0x96000007
[ 372.008186] remoteproc remoteproc0: stopped remote processor imx-rproc
[ 372.011231] EC = 0x25: DABT (current EL), IL = 32 bits
[ 372.023487] SET = 0, FnV = 0
[ 372.026600] EA = 0, S1PTW = 0
[ 372.029752] Data abort info:
[ 372.032674] ISV = 0, ISS = 0x00000007
[ 372.036544] CM = 0, WnR = 0
[ 372.039544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000004139e000
[ 372.046312] [ffff8000117d300c] pgd=000000005ffff003, p4d=000000005ffff003, pud=000000005fffe003, pmd=0000000042c1a003, pte=0000000000000000
[ 372.058954] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[ 372.064543] Modules linked in: imx_rpmsg_tty
[ 372.068837] CPU: 0 PID: 1578 Comm: kworker/0:0 Not tainted 5.10.72+g3aa7ba431365 #1
[ 372.076491] Hardware name: Ka-Ro electronics TX8M-ND00 (NXP i.MX8MN) module (DT)
[ 372.083920] Workqueue: events imx_rproc_vq_work
[ 372.088461] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[ 372.094474] pc : rpmsg_recv_done+0x74/0x3cc
[ 372.098658] lr : rpmsg_recv_done+0x188/0x3cc
[ 372.102926] sp : ffff8000117f3c90
[ 372.106240] x29: ffff8000117f3c90 x28: ffff00000382c810
[ 372.111557] x27: ffff800010cd4e70 x26: 0000000000000000
[ 372.116871] x25: ffff000003a55600 x24: 0000000000000001
[ 372.122186] x23: ffff00001fe88900 x22: ffff000003a55178
[ 372.127500] x21: 0000000000000800 x20: ffff000003a55100
[ 372.132814] x19: ffff8000117d3000 x18: 0000000000000030
[ 372.138128] x17: 0000000000000000 x16: 0000000000000000
[ 372.143443] x15: ffff000003d9c5f0 x14: ffffffffffffffff
[ 372.148757] x13: ffff8000917f38e7 x12: ffff8000117f38ef
[ 372.154072] x11: ffff8000110293b8 x10: ffff800010fe83f8
[ 372.159386] x9 : ffff8000110209a0 x8 : 0000000000000001
[ 372.164702] x7 : 0000000000000cc0 x6 : 0000000000000000
[ 372.170016] x5 : ffff8000117f3d08 x4 : ffff8000117f3d04
[ 372.175330] x3 : ffff8000115e8080 x2 : 0000000000000000
[ 372.180644] x1 : ffff8000115e8090 x0 : ffff8000117d3000
[ 372.185962] Call trace:
[ 372.188409] rpmsg_recv_done+0x74/0x3cc
[ 372.192257] vring_interrupt+0x64/0xfc
[ 372.196014] rproc_vq_interrupt+0x3c/0x90
[ 372.200025] imx_rproc_vq_work+0x20/0x54
[ 372.203958] process_one_work+0x1bc/0x340
[ 372.207968] worker_thread+0x70/0x434
[ 372.211634] kthread+0x12c/0x140
[ 372.214866] ret_from_fork+0x10/0x30
[ 372.218449] Code: a90363f7 52800018 f90033e0 d503201f (79401a75)
[ 372.224549] ---[ end trace e2ca89eaa037a5e7 ]---
[ 372.229171] Kernel panic - not syncing: Oops: Fatal exception
[ 372.234929] SMP: stopping secondary CPUs
[ 372.239170] Kernel Offset: disabled
[ 372.242663] CPU features: 0x0040002,2000200c
[ 372.246932] Memory Limit: none

I don't understand why. Do you have any idea ? 

Thanks a lot in advance for helping.

Labels (1)
0 Kudos
Reply
5 Replies

4,847 Views
ycx
Contributor I

Hi,

So I kept the set sizes but only temporized the stop of both cores. Now the app on Linux side sends a stop command to the firmware and data transfer is stopped before removal of endpoints and rpmsg contexts etc. I think something was not freed correctly or in the right order. I haven't seen a panic since.

Thanks
Yann

0 Kudos
Reply

4,842 Views
Juan-Rodarte
NXP Employee
NXP Employee

Hi,

Thanks for sharing your solution.

Best Regards,

Diego

0 Kudos
Reply

4,881 Views
Juan-Rodarte
NXP Employee
NXP Employee

Hello,
It's probably that you're allocating too much memory for RPMsg. You could try allocating less memory and passing the data in several parts or by another means and use RPMsg to notify that data is ready.

Best regards,

Diego

0 Kudos
Reply

4,878 Views
ycx
Contributor I

Hi Diego,

Thank you for the reply.

I made sure that regions are defined big enough in dts, however I have doubt about the linux,cma definition that tries to allocate 650MiB where I only have 512MiB of DRAM. It fails but then manages to allocate 128MiB properly : 

DDRINFO: DRAM rate 1600MTS
NOTICE: BL31: v2.4(release):1d0298da0
NOTICE: BL31: Built : 09:25:33, Apr 11 2022


U-Boot 2020.04-5.10.9-1.0.0+g1a024e0406 (Jul 29 2022 - 15:00:32 +0000)

CPU: i.MX8MNano Quad rev1.0 1500 MHz (running at 1200 MHz)
CPU: Commercial temperature grade (0C to 95C) at 37C
Reset cause: POR

DRAM: 512 MiB
CPU temperature: 37 C
MMC: mmc@30b50000: no card present
mmc@30b60000: no card present
FSL_SDHC: 0 (eMMC), FSL_SDHC: 1, FSL_SDHC: 2
Loading Environment from MMC... OK
Fail to setup video link

BuildInfo:
- ATF 1d0298d
- U-Boot 2020.04-5.10.9-1.0.0+g1a024e0406

loading FDT from mmc 0 'imx8mn-tx8m-nd00-m7.dtb'
MAC addr: 00:0c:c6:87:6b:8e
Net: eth0: ethernet@30be0000
Hit any key to stop autoboot: 0
17643528 bytes read in 651 ms (25.8 MiB/s)
## Flattened Device Tree blob at 43000000
Booting using the fdt blob at 0x43000000
Loading Device Tree to 000000005e913000, end 000000005e91ffff ... OK
serial-number: 2135699850d00200
switching usbotg interface to peripheral mode

Starting kernel ...

[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[ 0.000000] Linux version 5.10.72+gc3b1248550ab (aarch64-poky-linux-gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.36.1.20210209) #1 SMP PREEMPT Tue Aug 2 08:52:38 UTC 2022
[ 0.000000] earlycon: ec_imx6q0 at MMIO 0x0000000030860000 (options '115200')
[ 0.000000] printk: bootconsole [ec_imx6q0] enabled
[ 0.000000] OF: reserved mem: failed to allocate memory for node 'linux,cma'
[ 0.000000] Reserved memory: created DMA memory pool at 0x0000000058400000, size 1 MiB
[ 0.000000] OF: reserved mem: initialized node vdevbuffer@58400000, compatible id shared-dma-pool
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000040000000-0x000000005fffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000004047ffff] // reserved for m7
[ 0.000000] node 0: [mem 0x0000000040480000-0x0000000057ffffff] // not reserved
[ 0.000000] node 0: [mem 0x0000000058000000-0x000000005800ffff] // vring0
[ 0.000000] node 0: [mem 0x0000000058010000-0x00000000580fefff] // vring1
[ 0.000000] node 0: [mem 0x00000000580ff000-0x00000000580fffff] // rsc_table
[ 0.000000] node 0: [mem 0x0000000058100000-0x00000000583fffff] // not reserved
[ 0.000000] node 0: [mem 0x0000000058400000-0x00000000584fffff] // vdevbuffer 
[ 0.000000] node 0: [mem 0x0000000058500000-0x000000005fffffff] // not reserved
[ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x000000005fffffff]
[ 0.000000] On node 0 totalpages: 131072
[ 0.000000] DMA zone: 2048 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 131072 pages, LIFO batch:31
[ 0.000000] cma: dma_contiguous_reserve(limit 60000000)
[ 0.000000] cma: dma_contiguous_reserve: reserving 128 MiB for global area
[ 0.000000] cma: cma_declare_contiguous_nid(size 0x0000000008000000, base 0x0000000000000000, limit 0x0000000060000000 alignment 0x0000000000000000)
[ 0.000000] cma: Reserved 128 MiB at 0x0000000050000000
[ 0.000000] psci: probing for conduit method from DT.

 

Is it possible that DMA overrite undefined memory region based on badly defined cma size ?

0 Kudos
Reply

4,873 Views
Juan-Rodarte
NXP Employee
NXP Employee

Hi,

RPMsg is not made for large messages, so very large memory allocations can cause problems at allocation time.

Best regards,

Diego

0 Kudos
Reply