AnsweredAssumed Answered

rcu_preempt self-detected stall on CPU and usb xhci event error 37

Question asked by Jeffin Joys on Feb 9, 2020
Latest reply on Feb 17, 2020 by D.w. Luo

Dear NXP Team,

 

We are working on kernel 4.14.98 (GA BSP) on iMX8DX dual core NXP processor. We are using this device to connect to LTE using xhci interface via usb using pppd. Currently we are facing two issues

 

RCU stall warnings and hence system hangs after seeing crashes. The issue occurrence is random but exists. Sometimes we observe RCU stall warnings immediately  after device boots up and sometimes we observe after running the device for longs hours (> 12 hours).  This issue was not present when we used L4.14.62 Beta BSP.

USB Event Ring Full error. “xhci-cdns3: ERROR unknown event type 37”. We get continuous prints in our serial console making system unstable after keeping devices for long hours.

One the above issues is being produced when the device boots up using GA BSP L4.14.98.

 

We have tried applying the patch as mentioned in below link for fair.c file in kernel for RCU crash.

https://lore.kernel.org/lkml/20180426103133.GA6953@linaro.org/T/#md99127a919a18d23868e6ffe9c572a38b9587159

 

For USB event error, below are patches used. However we observe LTE ping command not working sometimes

https://patchwork.kernel.org/patch/10880073/

https://patchwork.kernel.org/patch/11165607/

With above patches, issue is getting reproduced less frequent, but still exist.

 

Can you please help us in how we can resolve the issue. How to debug and narrow down the root cause. Without resolving this issue, we are blocked.

 

Attached Logs for more information:

INFO: rcu_preempt self-detected stall on CPU

[ 1076.614300] INFO: rcu_preempt detected stalls on CPUs/tasks:

[ 1076.616581]  0-...: (1 GPs behind) idle=e42/2/0 softirq=40213/40214 fqs=1438

[ 1076.622233]  0-...: (1 GPs behind) idle=e42/2/0 softirq=40213/40214 fqs=1438

[ 1076.629359]

[ 1076.636490]

[ 1076.636493]  (t=5253 jiffies g=14886 c=14885 q=74)

[ 1076.636497] Task dump for CPU 0:

[ 1076.638075] (detected by 1, t=5253 jiffies, g=14886, c=14885, q=74)

[ 1076.639646] swapper/0       R  running task        0     0      0 0x00000002

[ 1076.660972] Call trace:

[ 1076.663426] [<ffff000008089668>] dump_backtrace+0x0/0x3c8

[ 1076.668831] [<ffff000008089a44>] show_stack+0x14/0x20

[ 1076.673886] [<ffff0000080f5cf4>] sched_show_task+0x14c/0x180

[ 1076.679549] [<ffff0000080f6c48>] dump_cpu_task+0x40/0x50

[ 1076.684868] [<ffff000008127288>] rcu_dump_cpu_stacks+0x94/0xd4

[ 1076.690705] [<ffff00000812604c>] rcu_check_callbacks+0x64c/0x958

[ 1076.696716] [<ffff00000812a1f4>] update_process_times+0x2c/0x58

[ 1076.702643] [<ffff000008139660>] tick_sched_handle.isra.4+0x30/0x48

[ 1076.708913] [<ffff0000081396b8>] tick_sched_timer+0x40/0x90

[ 1076.714490] [<ffff00000812ae70>] __hrtimer_run_queues+0xe8/0x168

[ 1076.720502] [<ffff00000812b0f0>] hrtimer_interrupt+0xa8/0x230

[ 1076.726256] [<ffff0000085f2808>] arch_timer_handler_phys+0x28/0x48

[ 1076.732441] [<ffff000008119950>] handle_percpu_devid_irq+0x80/0x138

[ 1076.738711] [<ffff0000081141bc>] generic_handle_irq+0x24/0x38

[ 1076.744462] [<ffff00000811482c>] __handle_domain_irq+0x5c/0xb8

[ 1076.750299] [<ffff000008080fe0>] gic_handle_irq+0x78/0x174

[ 1076.755789] Exception stack(0xffff000008003d90 to 0xffff000008003ed0)

[ 1076.762236] 3d80:                                   0000000000000000 ffff000008d07880

[ 1076.770073] 3da0: 0000000000000000 000000010002e1a9 000000000000002e 0000000000000002

[ 1076.777909] 3dc0: 0000000000000000 0000000040000000 ffff8000121fc248 ffff000008003c70

[ 1076.785745] 3de0: 00000000960c0650 0000000000000001 0000000000000401 ffff000009205750

[ 1076.793581] 3e00: 0000000000000000 0000000000000000 ffff00000813e220 0000000000000000

[ 1076.801418] 3e20: 0000000000000000 ffff000008b4f000 000000000000001a ffff000008d07880

[ 1076.809254] 3e40: ffff000008b4f018 ffff800012405780 ffff000008d06000 000000010002e1a8

[ 1076.817090] 3e60: ffff000008004000 ffff000008890000 0000000000000001 ffff000008003ed0

[ 1076.824926] 3e80: ffff0000080d0340 ffff000008003ed0 ffff000008081184 0000000040000145

[ 1076.832762] 3ea0: ffff8000125ab0a4 ffff8000125ab000 0000ffffffffffff ffff8000125ab028

[ 1076.840597] 3ec0: ffff000008003ed0 ffff000008081184

[ 1076.845480] [<ffff000008082a30>] el1_irq+0xb0/0x124

[ 1076.850362] [<ffff000008081184>] __do_softirq+0xa4/0x218

[ 1076.855681] [<ffff0000080d0340>] irq_exit+0xd0/0xf0

[ 1076.860561] [<ffff000008114830>] __handle_domain_irq+0x60/0xb8

[ 1076.866399] [<ffff000008080fe0>] gic_handle_irq+0x78/0x174

[ 1076.871888] Exception stack(0xffff000008c83dd0 to 0xffff000008c83f10)

[ 1076.878327] 3dc0:                                   0000000000000000 0000000000000000

[ 1076.886164] 3de0: 0000000000000001 0000000000000000 ffff000008b53108 ffff000008c83f00

[ 1076.894001] 3e00: 000080003740d000 ffff80003ff65b60 ffff000008c952c0 ffff000008c83e90

[ 1076.901836] 3e20: 00000000000008e0 0000000000000000 0000000000000001 0000000000000000

[ 1076.909673] 3e40: 00000000f6045434 0000000000000000 ffff00000813e220 0000000000000000

[ 1076.917509] 3e60: 0000000000000000 ffff000008b4f018 ffff000008c8a000 ffff000008c8a000

[ 1076.925345] 3e80: ffff000008b57a30 ffff000008c8aa60 0000000000000000 0000000000000000

[ 1076.933181] 3ea0: ffff000008c94980 0000000000000400 0000000080cf0018 ffff000008c83f10

[ 1076.941017] 3ec0: ffff000008084efc ffff000008c83f10 ffff000008084f00 0000000000000145

[ 1076.948853] 3ee0: 0000000000000000 00000000bfe9eb84 ffffffffffffffff ffff0000081397dc

[ 1076.956688] 3f00: ffff000008c83f10 ffff000008084f00

[ 1076.961570] [<ffff000008082a30>] el1_irq+0xb0/0x124

[ 1076.966454] [<ffff000008084f00>] arch_cpu_idle+0x10/0x18

[ 1076.971770] [<ffff000008106f50>] do_idle+0x120/0x1e0

[ 1076.976739] [<ffff0000081071a8>] cpu_startup_entry+0x20/0x28

[ 1076.982406] [<ffff00000886b314>] rest_init+0xcc/0xd8

[ 1076.987377] [<ffff000008af0c1c>] start_kernel+0x3d4/0x3e8

[ 1076.992774] Task dump for CPU 0:

[ 1076.995997] swapper/0       R  running task        0     0      0 0x00000022

[ 1077.003050] Call trace:

[ 1077.005497] [<ffff0000080853ac>] __switch_to+0x94/0xd8

Outcomes