I have detected a kernel crash situation when using the imx_rpmsg_tty driver to communicate with the M7 co-processor, resulting in my application being hanged forever (kill -9 does not work), and the whole system is unable to reboot.
How to reproduce:
[full trace log attached]
Call trace:
tty_buffer_flush+0x48/0x100
tty_ldisc_flush+0x3c/0x84
tty_port_close_start.part.0+0xc4/0x1d0
tty_port_close+0x40/0xcc
rpmsgtty_close+0x1c/0x30 [imx_rpmsg_tty]
tty_release+0x138/0x61c
__fput+0x78/0x230
____fput+0x10/0x20
task_work_run+0x80/0x140
do_exit+0x32c/0xa04
die+0x21c/0x260
die_kernel_fault+0x64/0x7c
__do_kernel_fault+0x11c/0x160
do_translation_fault+0x54/0xd0
do_mem_abort+0x44/0xa4
el1_abort+0x74/0xdc
el1_sync_handler+0xac/0xd0
el1_sync+0x88/0x140
dev_driver_string+0x10/0x40
rpmsg_send_offchannel_raw+0x3d8/0x4b0
virtio_rpmsg_send+0x28/0x34
rpmsg_send+0x24/0x44
rpmsgtty_write+0x60/0xe0 [imx_rpmsg_tty]
n_tty_write+0x2b0/0x45c
file_tty_write.constprop.0+0x138/0x290
tty_write+0x14/0x20
new_sync_write+0xe8/0x184
vfs_write+0x244/0x2a4
ksys_write+0x68/0xf4
__arm64_sys_write+0x20/0x2c
el0_svc_common.constprop.0+0x80/0x240
do_el0_svc+0x24/0x90
el0_svc+0x14/0x20
el0_sync_handler+0x1a4/0x1b0
el0_sync+0x180/0x1c0
Code: 9100a298 aa1803e0 94248c10 f9400280 (c8dffc13)
So I dug a bit and it seems to me that this should be handled by the imx_rpmsg_tty driver, as it is the driver that receives a notification of the rpmsg driver being removed and is also able to prevent further writes on that channel. In fact I actually implemented some workaround using a global variable to hold the channel state to avoid this and it seemed to be working, though does not look ideal [patch attached].
Would be really nice to hear opinions from maintainers on how to address and fix this properly
Additional information:
Processor: IMX8MP
Board: Variscite VAR-SOM-IMX8M-PLUS
root@imx8mp-var-dart:~# uname -a
Linux imx8mp-var-dart 5.10.72+gd2cfea0c171e #1 SMP PREEMPT Thu Jul 14 16:54:09 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux