AnsweredAssumed Answered

i.MX6Q rcu_preempt cpu stall

Question asked by Jordan Salm on Sep 17, 2019
Latest reply on Sep 18, 2019 by igorpadykov

We are dealing with an intermittent CPU stall which eventually locks/hangs the system.

 

Here is the relevant backtrace:

[307362.408117] INFO: rcu_preempt detected stalls on CPUs/tasks:
[307362.412514]     (detected by 1, t=2102 jiffies, g=12684671, c=12684670, q=711)
[307362.418223] All QSes seen, last rcu_preempt kthread activity 2101 (30706237-30704136), jiffies_till_next_fqs=1, root ->qsmask 0x0
[307362.428582] cfinteractive   R running      0    38      2 0x00000000
[307362.428609] Backtrace:
[307362.428661] [<8010c380>] (dump_backtrace) from [<8010c5fc>] (show_stack+0x20/0x24)
[307362.428675]  r7:80e02658 r6:80e02100 r5:00000002 r4:a8185400
[307362.428730] [<8010c5dc>] (show_stack) from [<8015ed04>] (sched_show_task+0xc4/0x118)
[307362.428761] [<8015ec40>] (sched_show_task) from [<8018bf08>] (rcu_check_callbacks+0xa64/0xa70)
[307362.428772]  r5:2a976000 r4:80d9cf80
[307362.428816] [<8018b4a4>] (rcu_check_callbacks) from [<801905c4>] (update_process_times+0x4c/0x74)
[307362.428828]  r10:ab70f4d8 r9:801a3874 r8:a8325c40 r7:0001178b r6:00000000 r5:a8185400
[307362.428875]  r4:ffffe000
[307362.428908] [<80190578>] (update_process_times) from [<801a3870>] (tick_sched_handle+0x58/0x5c)
[307362.428919]  r7:0001178b r6:61225002 r5:ab70f650 r4:80e02548
[307362.428967] [<801a3818>] (tick_sched_handle) from [<801a38e8>] (tick_sched_timer+0x74/0xc8)
[307362.428993] [<801a3874>] (tick_sched_timer) from [<80191784>] (__run_hrtimer+0x80/0x284)
[307362.429003]  r8:a8325b50 r7:ab70f3c0 r6:ab70f3f8 r5:0001178b r4:ab70f650
[307362.429063] [<80191704>] (__run_hrtimer) from [<80191d94>] (hrtimer_interrupt+0x138/0x344)
[307362.429073]  r9:00000001 r8:ab70f3c0 r7:00000000 r6:ab70f3f8 r5:0001178b r4:6122497f
[307362.429139] [<80191c5c>] (hrtimer_interrupt) from [<80110d24>] (twd_handler+0x40/0x50)
[307362.429150]  r10:80e54000 r9:a8022f00 r8:00000001 r7:00000010 r6:a808b800 r5:ab715580
[307362.429196]  r4:00000001
[307362.429231] [<80110ce4>] (twd_handler) from [<801819bc>] (handle_percpu_devid_irq+0xac/0x1d0)
[307362.429242]  r5:ab715580 r4:00000010
[307362.429278] [<80181910>] (handle_percpu_devid_irq) from [<8017d024>] (generic_handle_irq+0x3c/0x4c)
[307362.429289]  r10:a8325c40 r9:a8034000 r8:00000001 r7:00000000 r6:00000010 r5:00000000
[307362.429334]  r4:00000010 r3:80181910
[307362.429368] [<8017cfe8>] (generic_handle_irq) from [<8017d350>] (__handle_domain_irq+0x8c/0xfc)
[307362.429379]  r5:00000000 r4:80d9ac84
[307362.429413] [<8017d2c4>] (__handle_domain_irq) from [<80101560>] (gic_handle_irq+0x34/0x6c)
[307362.429424]  r10:00000004 r9:a8325d90 r8:00000001 r7:f4a00100 r6:a8325c40 r5:80e03194
[307362.429469]  r4:f4a0010c r3:a8325c40
[307362.429502] [<8010152c>] (gic_handle_irq) from [<8010d240>] (__irq_svc+0x40/0x74)
[307362.429515] Exception stack(0xa8325c40 to 0xa8325c88)
[307362.429538] 5c40: 00000003 ab72f934 00000003 00000003 80e02690 ab713200 80e0300c ab713204
[307362.429558] 5c60: 00000001 a8325d90 00000004 a8325cbc 00000003 a8325c88 801a889c 801a88cc
[307362.429571] 5c80: 20070013 ffffffff
[307362.429581]  r7:a8325c74 r6:ffffffff r5:20070013 r4:801a88cc
[307362.429635] [<801a865c>] (smp_call_function_many) from [<801a8968>] (smp_call_function+0x48/0x88)
[307362.429646]  r10:ffffffff r9:a8325d88 r8:00000000 r7:00000000 r6:00000001 r5:a8325d90
[307362.429692]  r4:801109d0
[307362.429720] [<801a8920>] (smp_call_function) from [<801a89e0>] (on_each_cpu+0x38/0x90)
[307362.429730]  r7:00000000 r6:00000001 r5:a8325d90 r4:801109d0
[307362.429782] [<801a89a8>] (on_each_cpu) from [<80110d6c>] (twd_rate_change+0x38/0x40)
[307362.429793]  r7:00000000 r6:00000002 r5:a8325d88 r4:ffffffff
[307362.429846] [<80110d34>] (twd_rate_change) from [<801533e8>] (notifier_call_chain+0x54/0x94)
[307362.429870] [<80153394>] (notifier_call_chain) from [<801538b8>] (__srcu_notifier_call_chain+0x54/0x70)
[307362.429880]  r9:a8325d88 r8:00000002 r7:00000000 r6:00000000 r5:a80b2544 r4:a80b255c
[307362.429939] [<80153864>] (__srcu_notifier_call_chain) from [<801538fc>] (srcu_notifier_call_chain+0x28/0x30)
[307362.429949]  r10:00000000 r9:80e9db84 r8:00000002 r7:80e02548 r6:a803ac00 r5:80e82cd0
[307362.429995]  r4:a80b2540
[307362.430027] [<801538d4>] (srcu_notifier_call_chain) from [<8074a3b4>] (__clk_notify+0xa0/0xa8)
[307362.430049] [<8074a314>] (__clk_notify) from [<8074a464>] (__clk_recalc_rates+0xa8/0xac)
[307362.430059]  r8:a8038b80 r7:00000001 r6:179a7b00 r5:00000002 r4:a803ac00
[307362.430113] [<8074a3bc>] (__clk_recalc_rates) from [<8074a438>] (__clk_recalc_rates+0x7c/0xac)
[307362.430123]  r7:00000001 r6:2f34f600 r5:00000002 r4:a803ac00
[307362.430171] [<8074a3bc>] (__clk_recalc_rates) from [<8074a438>] (__clk_recalc_rates+0x7c/0xac)
[307362.430181]  r7:00000001 r6:2f34f600 r5:00000002 r4:a803d680
[307362.430231] [<8074a3bc>] (__clk_recalc_rates) from [<8074cdf0>] (clk_core_set_parent+0x1bc/0x2dc)
[307362.430242]  r7:00000001 r6:00000000 r5:a803af80 r4:a803d000
[307362.430290] [<8074cc34>] (clk_core_set_parent) from [<8074cf3c>] (clk_set_parent+0x2c/0x30)
[307362.430300]  r9:00000000 r8:179a7b00 r7:80e02548 r6:000c15c0 r5:00060ae0 r4:80f02f14
[307362.430366] [<8074cf10>] (clk_set_parent) from [<806d2b70>] (imx6q_set_target+0x490/0x544)
[307362.430395] [<806d26e0>] (imx6q_set_target) from [<806c8984>] (__cpufreq_driver_target+0x184/0x2b0)
[307362.430406]  r10:00000000 r9:00000000 r8:80f02e4c r7:00000000 r6:80e02548 r5:a8587c00
[307362.430450]  r4:00000000
[307362.430481] [<806c8800>] (__cpufreq_driver_target) from [<806d1694>] (cpufreq_interactive_speedchange_task+0x264/0x354)
[307362.430492]  r10:80d9be90 r9:80e02690 r8:80e0300c r7:00000047 r6:000c15c0 r5:80d9be90
[307362.430537]  r4:ab71ee90
[307362.430574] [<806d1430>] (cpufreq_interactive_speedchange_task) from [<8015250c>] (kthread+0xfc/0x114)
[307362.430585]  r10:00000000 r9:00000000 r8:00000000 r7:806d1430 r6:00000000 r5:a830e640
[307362.430629]  r4:00000000
[307362.430661] [<80152410>] (kthread) from [<80108028>] (ret_from_fork+0x14/0x2c)
[307362.430671]  r7:00000000 r6:00000000 r5:80152410 r4:a830e640
[307362.430710] rcu_preempt kthread starved for 2101 jiffies!

 

This continues for about half an hour before the backtrace starts aborting due to a bad frame pointer.  We are using Pyro @ 4.1.15-2.0.0.

 

Here are the only relevant kernel configurations I can find:

 

CONFIG_PREEMPT_VOLUNTARY=y

# CONFIG_RCU_CPU_STALL_INFO is not set

 

This seems to be a very similar/identical stack trace to an issue listed here: RCU stall/hang on imx6q using 4.9.88_2.0.0_ga 

 

Any thoughts on what to tackle first?

Outcomes