AnsweredAssumed Answered

NULL pointer deref in sdma_int_handler (sabreboard i.MX7D)

Question asked by Christina Quast on May 17, 2019
Latest reply on May 23, 2019 by Yuri Muhin
Hello everyone!
When the mikrobus on Sabreboard with i.MX7D is used as a UART, and RX and TX
are connected with acable (see https://photos.app.goo.gl/kbtempE6Us5vtatSA,
red and white cable are interconnected), then the communication either stalls (it is impossible to open up the UART interface anymore) or sometimes we see a fatal kernel NULL pointer. It is even worse on our own board which we based on the sabreboard.  I got this kernel crash on the latest image downloaded fromhttps://www.nxp.com/webapp/Download?colCode=L4.14.98_2.0.0_MX7D Any idea how to fix that? It looks like commit 1c8960e43cf3d8ac7493e2f7e4c086bc3749ef44 in fslc/4.9-2.3.x-imx should have fixed this issue, but it doesn't.
The bug shows up after 2-10 minutes of heavy DMA load, and it's easy to reproduce.
I debugged it as far that I can tell that the vd->node.next pointer contains the
LIST_POISON1 value 0x100. But I don't know yet why.
The NXP Technical support said their Mikrobus is not fully supported by their BSP:
Please note that sdma is not fully supported by NXP - there are no sdma sources 
and no debugging methods, no documentation for it.
....
Please note that Linux is not the product of NXP. Customer gets this software for free, 
but NXP support on it is limited to the hardware platforms and software configurations, 
initially provided by NXP. All software modifications, especially, in the kernel and drivers 
code, are at the customer's own risk. For more information about official NXP
position for Linux support, please check NXP Linux Technology Support Policy page:
"BSPs are offered free of charge, "AS IS."
----------------------
So shortly, as "mikrobus" is not referenced in Release Notes - it is not supported.
Support may be provided only through community or NXP Professional Services.

 

Even though I reproduced the bug on our own boards and our own yocto image (no mikrobus,
just interconnected RX & TX) and with the Mikrobus on sabreboard with the stock image
provided from NXP, it is most probably a synchronization bug on heavy DMA usage (and
interconnecting RX and TX leads to more data being passed to SDMA, so the bug shows up
quicker than under normal SDMA usage conditions).

Here is the log of the commands and the kernel crash:
NXP i.MX Release Distro 4.14-sumo imx7dsabresd ttymxc0  imx7dsabresd login: root Last login: Thu Feb 21 01:49:59 UTC 2019 on tty7 root@imx7dsabresd:~# . ./.shrc root@imx7dsabresd:~# cat .shrc echo 8 8 8 8 > /proc/sys/kernel/printk  burst() {     for x in 1 2 3 4 5 6 7; do         test -c /dev/ttymxc$x && stty -F /dev/ttymxc$x 115200 cs8 -cstopb -parenb ;     done     burst_mem;     burst_ttys; }  burst_mem() {     dd if=/dev/mmcblk0 of=/dev/null bs=10M & # spam sd card     dd if=/dev/mmcblk2 of=/dev/null bs=10M & # spam mmc     dd if=/dev/urandom | md5sum & # occupy CPU }  burst_ttys() {     for x in 1 2 3 4 5 6 7; do         test -c /dev/ttymxc$x && echo ttymxc$x         test -c /dev/ttymxc$x && dd if=/dev/urandom of=/dev/ttymxc$x bs=1024 & # bursting the UARTS with data,         test -c /dev/ttymxc$x && dd of=/dev/null if=/dev/ttymxc$x bs=1024 & # bursting the UARTS with data,     done      ps | grep dd }  root@imx7dsabresd:~# burst [1] 464 [2] 465 [3] 467 [4] 468 [5] 469 [6] 470 [7] 471 [8] 472 [9] 473 ttymxc4 [10] 474 [11] 475 ttymxc5 [12] 476 [13] 477 [14] 478 [15] 479 [16] 480 [17] 481 dd: failed to open '/dev/mmcblk2': No such file or directory 464 ttymxc0 00:00:00 dd 466 ttymxc0 00:00:00 dd 484 ttymxc0 00:00:00 dd 485 ttymxc0 00:00:00 dd 486 ttymxc0 00:00:00 dd 487 ttymxc0 00:00:00 dd [2] Done(1) dd if=/dev/mmcblk2 of=/dev/null bs=10M [4] Done(1) test -c /dev/ttymxc$x && dd if=/dev/urandom of=/dev/ttymxc$x bs=1024 [5] Done(1) test -c /dev/ttymxc$x && dd of=/dev/null if=/dev/ttymxc$x bs=1024 [6] Done(1) test -c /dev/ttymxc$x && dd if=/dev/urandom of=/dev/ttymxc$x bs=1024 [7] Done(1) test -c /dev/ttymxc$x && dd of=/dev/null if=/dev/ttymxc$x bs=1024 [8] Done(1) test -c /dev/ttymxc$x && dd if=/dev/urandom of=/dev/ttymxc$x bs=1024 [9] Done(1) test -c /dev/ttymxc$x && dd of=/dev/null if=/dev/ttymxc$x bs=1024 [14] Done(1) test -c /dev/ttymxc$x && dd if=/dev/urandom of=/dev/ttymxc$x bs=1024 [15] Done(1) test -c /dev/ttymxc$x && dd of=/dev/null if=/dev/ttymxc$x bs=1024 [16]- Done(1) test -c /dev/ttymxc$x && dd if=/dev/urandom of=/dev/ttymxc$x bs=1024 [17]+ Done(1) test -c /dev/ttymxc$x && dd of=/dev/null if=/dev/ttymxc$x bs=1024 0+0 records in 0+0 records out 0 bytes copied, 0.0724624 s, 0.0 kB/s root@imx7dsabresd:~# Unable to handle kernel NULL pointer dereference at virtual address 00000104 pgd = 80004000 [00000104] *pgd=00000000 Internal error: Oops: 817 [#1] PREEMPT SMP ARM Modules linked in: brcmfmac brcmutil ov5640_camera_mipi_v2 mx6s_capture mxc_mipi_csi CPU: 0 PID: 158 Comm: kworker/u4:2 Not tainted 4.14.98-imx_4.14.98_2.0.0_ga+g5d6cbea #1 Hardware name: Freescale i.MX7 Dual (Device Tree) Workqueue: events_unbound flush_to_ldisc task: a83b0c00 task.stack: a8656000 PC is at vchan_dma_desc_free_list+0x6c/0x90 LR is at 0xa83e849c pc : [<8046a308>] lr : [<a83e849c>] psr: 400d0193 sp : a8657d98 ip : 00000000 fp : c148e018 r10: 00000100 r9 : a83e846c r8 : 000000d8 r7 : a83e84bc r6 : 00000100 r5 : 00000200 r4 : a8657db8 r3 : a83e84c4 r2 : 00000200 r1 : 00000002 r0 : a83e849c Flags: nZcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c53c7d Table: a8b8c06a DAC: 00000051 Process kworker/u4:2 (pid: 158, stack limit = 0xa8656210) Stack: (0xa8657d98 to 0xa8658000) 7d80: a83e846c a840ec10 7da0: a83e84b8 a83e84e0 c149026c 00000014 a8a86400 8046d404 a8657db8 a8657db8 7dc0: a83e0be0 a840ec10 a840ec10 a00d0113 c149026c 00000014 a8a86400 804b0924 7de0: a840ec10 a00d0113 a83e0be0 00000000 a840ec10 a00d0113 c149026c 00000014 7e00: a8a86400 804ac578 00000000 a8a86400 c148e000 a8a86474 c149026c 80493fdc 7e20: a8a86400 0000001a 00000000 a8a86400 a8adc818 00000000 a8a86400 80494d34 7e40: 0000001a 00000601 00000000 80495200 c149026c a8a86400 a8adc76a 00000600 7e60: 00000601 00000000 a8adc817 a8adc818 00000000 a8a86400 c148e018 80496278 7e80: ab71e080 00000000 c148e000 00000000 a8adce18 c1490000 55555556 00000045 7ea0: 80adba20 c148e000 a8a86474 00000e00 00000e00 80157094 ffffffff a8adc018 7ec0: 00000000 00000e00 a9011500 a8adc018 804998d4 00000088 a8004200 80496424 7ee0: 00000001 00000000 00000e00 8049990c a8adc000 a83e0be4 a83e0bf4 a83e0be0 7f00: a8adc018 80499350 a83e0be4 a86e8100 a8004200 a8003300 00000000 00000000 7f20: 00000088 801440c0 80f02d00 a8004218 a86e8100 a8004200 a86e8118 80f02d00 7f40: a8004218 ffffe000 00000088 80144dbc ffffe000 80fa563c 80c1fccc 00000000 7f60: ffffe000 a8627d40 a86e7c00 00000000 a8656000 a86e8100 80144d6c a85e5ec8 7f80: a8627d5c 80149b1c a8656000 a86e7c00 801499d0 00000000 00000000 00000000 7fa0: 00000000 00000000 00000000 80107a68 00000000 00000000 00000000 00000000 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [<8046a308>] (vchan_dma_desc_free_list) from [<8046d404>] (sdma_terminate_all+0x17c/0x1b0) [<8046d404>] (sdma_terminate_all) from [<804b0924>] (imx_flush_buffer+0x38/0x19c) [<804b0924>] (imx_flush_buffer) from [<804ac578>] (uart_flush_buffer+0x84/0x100) [<804ac578>] (uart_flush_buffer) from [<80493fdc>] (isig+0x70/0xf4) [<80493fdc>] (isig) from [<80494d34>] (n_tty_receive_signal_char+0x18/0x60) [<80494d34>] (n_tty_receive_signal_char) from [<80495200>] (n_tty_receive_char_special+0x484/0xb5c) [<80495200>] (n_tty_receive_char_special) from [<80496278>] (n_tty_receive_buf_common+0x84c/0x9e4) [<80496278>] (n_tty_receive_buf_common) from [<80496424>] (n_tty_receive_buf2+0x14/0x1c) [<80496424>] (n_tty_receive_buf2) from [<8049990c>] (tty_port_default_receive_buf+0x38/0x58) [<8049990c>] (tty_port_default_receive_buf) from [<80499350>] (flush_to_ldisc+0x88/0xc8) [<80499350>] (flush_to_ldisc) from [<801440c0>] (process_one_work+0x1d8/0x414) [<801440c0>] (process_one_work) from [<80144dbc>] (worker_thread+0x50/0x598) [<80144dbc>] (worker_thread) from [<80149b1c>] (kthread+0x14c/0x154) [<80149b1c>] (kthread) from [<80107a68>] (ret_from_fork+0x14/0x2c) Code: e59ec004 e1a0000e e59e202c e31c0040 (e58a2004) ---[ end trace b5fe98d9b3a0b4ec ]--- note: kworker/u4:2[158] exited with preempt_count 1  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: Internal error: Oops: 817 [#1] PREEMPT SMP ARM  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: Modules linked in: brcmfmac brcmutil ov5640_camera_mipi_v2 mx6s_capture mxc_mipi_csi  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: CPU: 0 PID: 158 Comm: kworker/u4:2 Not tainted 4.14.98-imx_4.14.98_2.0.0_ga+g5d6cbea #1  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: Hardware name: Freescale i.MX7 Dual (Device Tree)  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: Workqueue: events_unbound flush_to_ldisc  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: task: a83b0c00 task.stack: a8656000  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: PC is at vchan_dma_desc_free_list+0x6c/0x90  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: LR is at 0xa83e849c  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: pc : [<8046a308>] lr : [<a83e849c>] psr: 400d0193  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: sp : a8657d98 ip : 00000000 fp : c148e018  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: r10: 00000100 r9 : a83e846c r8 : 000000d8  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ...  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresdimx7dsabresd kernel: 7de0: a840ec10 a00d0113 a83e0be0 00000000 a840ec10 a00d0113 c149026c 00000014kernel: r7 : a83e84bc r6 : 00000100 r5 : 00000200 r4 : a8657db8  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ...  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresdimx7dsabresd kernel: r3 : a83e84c4 r2 : 00000200 r1 : 00000002 r0 : a83e849ckernel: 7e00: a8a86400 804ac578 00000000 a8a86400 c148e000 a8a86474 c149026c 80493fdc  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: 7dc0: a83e0be0 a840ec10 a840ec10 a00d0113 c149026c 00000014 a8a86400 804b0924  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: 7ee0: 00000001 00000000 00000e00 8049990c a8adc000 a83e0be4 a83e0bf4 a83e0be0  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: 7e20: a8a86400 0000001a 00000000 a8a86400 a8adc818 00000000 a8a86400 80494d34  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: 7e40: 0000001a 00000601 00000000 80495200 c149026c a8a86400 a8adc76a 00000600  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: 7da0: a83e84b8 a83e84e0 c149026c 00000014 a8a86400 8046d404 a8657db8 a8657db8  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: 7e60: 00000601 00000000 a8adc817 a8adc818 00000000 a8a86400 c148e018 80496278  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: 7e80: ab71e080 00000000 c148e000 00000000 a8adce18 c1490000 55555556 00000045  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: 7ea0: 80adba20 c148e000 a8a86474 00000e00 00000e00 80157094 ffffffff a8adc018  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: Control: 10c53c7d Table: a8b8c06a DAC: 00000051  Message from syslogd@imx7dsabresd at Thu Feb 21 01:50:14 2019 ... imx7dsabresd kernel: 7ec0: 00000000 00000e00 a9011500 a8adc018 804998d4 00000088 a8004200 80496424  Message from syslogd@imx7dsabresd at Thu Unable to handle kernel NULL pointer dereference at virtual address 00000004 pgd = 80004000 [00000004] *pgd=00000000 Internal error: Oops: 17 [#2] PREEMPT SMP ARM Modules linked in: brcmfmac brcmutil ov5640_camera_mipi_v2 mx6s_capture mxc_mipi_csi CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D 4.14.98-imx_4.14.98_2.0.0_ga+g5d6cbea #1 Hardware name: Freescale i.MX7 Dual (Device Tree) task: 80f07240 task.stack: 80f00000 PC is at sdma_int_handler+0x2c/0x338 LR is at sdma_int_handler+0x28/0x338 pc : [<8046d77c>] lr : [<8046d778>] psr: 600d0193 sp : 80f01e28 ip : a60089a0 fp : a83e8010 r10: 80c21d50 r9 : 80fa5689 r8 : 00000043 r7 : 80f01ea4 r6 : 00000000 r5 : a815ac00 r4 : a83ea010 r3 : 00000000 r2 : 00010001 r1 : a83e8010 r0 : 00000000 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c53c7d Table: a8b8c06a DAC: 00000051 Process swapper/0 (pid: 0, stack limit = 0x80f00210) Stack: (0x80f01e28 to 0x80f02000) 1e20: ab71a540 a83ea010 d8ac3180 00000005 d944c800 80182780 1e40: 00000005 a835b7c0 a815ac00 00000000 80f01ea4 00000043 80fa5689 80c21d50 1e60: 80c21d28 8016eb8c 80f01e7c 809d49d4 00000035 a815ac00 80f02098 a815ac00 1e80: a815ac00 80f0af84 00000000 00000001 80f01f20 a8008000 00000000 8016ec74 1ea0: 80f01f20 00000000 00000000 a815ac00 a815ac64 8016ece8 a815ac00 a815ac64 1ec0: 80f0af84 801722cc 80e82c38 00000000 00000043 8016de90 80e82c38 8016e3b8 1ee0: 80f03fe4 80f1c0d0 c080200c c0802000 80f01f20 c0803000 00000000 801014d0 1f00: 806e4100 200d0013 ffffffff 80f01f54 d92246fe 80f00000 00000000 8010bfcc 1f20: 00000000 ab721200 00000001 ab721200 d92e2f55 00000005 ab71d4e8 00000001 1f40: d92246fe 00000005 00000000 00000000 00000000 80f01f70 809f0ac4 806e4100 1f60: 200d0013 ffffffff 00000051 00000000 ab71d4e8 ffffe000 80f03d78 80f03d2c 1f80: 80e824e0 80c3b05c ab71d4e8 80f0abc0 00000000 80162e24 000000be 80fa7000 1fa0: 80f03d00 ffffffff 80fa7000 abfffb40 80e65a30 8016312c 80fa704c 80e00c5c 1fc0: ffffffff ffffffff 00000000 80e00680 00000000 80e65a30 80fa7294 80f03d18 1fe0: 80e65a2c 80f084c4 8000406a 410fc075 00000000 8000807c 00000000 00000000 [<8046d77c>] (sdma_int_handler) from [<8016eb8c>] (handle_irq_event_percpu+0x50/0x11c) [<8016eb8c>] (handle_irq_event_percpu) from [<8016ec74>] (handle_irq_event_percpu+0x1c/0x58) [<8016ec74>] (handle_irq_event_percpu) from [<8016ece8>] (handle_irq_event+0x38/0x5c) [<8016ece8>] (handle_irq_event) from [<801722cc>] (handle_fasteoi_irq+0xb8/0x16c) [<801722cc>] (handle_fasteoi_irq) from [<8016de90>] (generic_handle_irq+0x24/0x34) [<8016de90>] (generic_handle_irq) from [<8016e3b8>] (handle_domain_irq+0x7c/0xec) [<8016e3b8>] (handle_domain_irq) from [<801014d0>] (gic_handle_irq+0x4c/0x90) [<801014d0>] (gic_handle_irq) from [<8010bfcc>] (irq_svc+0x6c/0xa8) Exception stack(0x80f01f20 to 0x80f01f68) 1f20: 00000000 ab721200 00000001 ab721200 d92e2f55 00000005 ab71d4e8 00000001 1f40: d92246fe 00000005 00000000 00000000 00000000 80f01f70 809f0ac4 806e4100 1f60: 200d0013 ffffffff [<8010bfcc>] (irq_svc) from [<806e4100>] (cpuidle_enter_state+0x13c/0x2cc) [<806e4100>] (cpuidle_enter_state) from [<80162e24>] (do_idle+0x1b8/0x208) [<80162e24>] (do_idle) from [<8016312c>] (cpu_startup_entry+0x18/0x1c) [<8016312c>] (cpu_startup_entry) from [<80e00c5c>] (start_kernel+0x388/0x394) Code: ebffcb90 e594058c ebffcb8e e59434cc (e5935004) ---[ end trace b5fe98d9b3a0b4ed ]--- Kernel panic - not syncing: Fatal exception in interrupt CPU1: stopping CPU: 1 PID: 466 Comm: dd Tainted: G D 4.14.98-imx_4.14.98_2.0.0_ga+g5d6cbea #1 Hardware name: Freescale i.MX7 Dual (Device Tree) [<8010ef44>] (unwind_backtrace) from [<8010b49c>] (show_stack+0x10/0x14) [<8010b49c>] (show_stack) from [<809d7144>] (dump_stack+0x78/0x8c) [<809d7144>] (dump_stack) from [<8010dd98>] (handle_IPI+0x198/0x1ac) [<8010dd98>] (handle_IPI) from [<80101510>] (gic_handle_irq+0x8c/0x90) [<80101510>] (gic_handle_irq) from [<8010bfcc>] (irq_svc+0x6c/0xa8) Exception stack(0xa8d05e08 to 0xa8d05e50) 5e00: 97b85020 0182d030 00000160 a03fddb0 cc3c3e61 2413eb6a 5e20: 9fd8499f 775c5d4c 37f71a55 00000000 a8d05f00 97b85000 8ca606e5 a8d05e5c 5e40: 95ecd319 809d3828 200d0013 ffffffff [<8010bfcc>] (irq_svc) from [<809d3828>] (arm_copy_from_user+0x7c/0x3d4) [<809d3828>] (arm_copy_from_user) from [<803e302c>] (copyin+0x44/0x58) [<803e302c>] (copyin) from [<803e4230>] (copy_page_from_iter+0x238/0x3ec) [<803e4230>] (copy_page_from_iter) from [<8020ef98>] (pipe_write+0xbc/0x434) [<8020ef98>] (pipe_write) from [<80207874>] (vfs_write+0xd0/0x120) [<80207874>] (vfs_write) from [<80207a38>] (vfs_write+0xa4/0x168) [<80207a38>] (vfs_write) from [<80207bfc>] (SyS_write+0x3c/0x90) [<80207bfc>] (SyS_write) from [<80107980>] (ret_fast_syscall+0x0/0x54) ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Outcomes