We are seeing a periodic rcu stall/hang, where sometimes we can get a backtrace. Most often, the system just hangs. Please advise.
RCU-relevant config options:
(CONFIG_PREEMPT_RCU=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
# RCU Debugging
# CONFIG_PROVE_RCU is not set
# CONFIG_SPARSE_RCU_POINTER is not set
# CONFIG_RCU_PERF_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=21
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set)
Backtrace 1
(function calls)
0x8010bd48 (+0x18) show_stack
0x80149f3c (+0xb8) sched_show_task
0x80175d1c (+0x998) rcu_check_callbacks
0x80179148 (+0x3c) update_process_times
0x8018a668 (+0x50) tick_sched_handle
0x8018a6bc (+0x50) tick_sched_timer
0x8017a1b0 (+0xc8) __hrtimer_run_queues
0x8017a3e8 (+0xac) hrtimer_interrupt
0x80188904 (+0x34) tick_receive_broadcast
0x8010e15c (+0xf0) handle_IPI
0x801014a0 (+0x90) gic_handle_irq
0x8010c74c (+0x6c) __irq_svc
0x804a51d8 (+0x1c) cpuidle_enter
0x8015e684 (+0x28) call_cpuidle
0x8015e8cc (+0x140) cpu_startup_entry
0x80651c48 (+0x8c) rest_init
0x80800c54 (+0x350) start_kernel
(File/line)
/usr/src/kernel/arch/arm/kernel/traps.c:247
/usr/src/kernel/kernel/sched/core.c:5216
/usr/src/kernel/kernel/rcu/tree.c:1403
/usr/src/kernel/arch/arm/include/asm/thread_info.h:94
/usr/src/kernel/kernel/time/tick-sched.c:153
/usr/src/kernel/kernel/time/tick-sched.c:1164
/usr/src/kernel/kernel/time/hrtimer.c:1255
/usr/src/kernel/kernel/time/hrtimer.c:1356
/usr/src/kernel/kernel/time/tick-broadcast.c:252
/usr/src/kernel/arch/arm/kernel/smp.c:612
/usr/src/kernel/drivers/irqchip/irq-gic.c:382
/usr/src/kernel/arch/arm/kernel/entry-armv.S:222
/usr/src/kernel/drivers/cpuidle/cpuidle.c:270
/usr/src/kernel/kernel/sched/idle.c:120
/usr/src/kernel/kernel/sched/idle.c:185
/usr/src/kernel/init/main.c:410
/usr/src/kernel/init/main.c:665
Backtrace 2:
(function calls)
0x80653a48 (+0x58) schedule
0x80656a84 (+0x158) schedule_timeout
0x80174e68 (+0x518) rcu_gp_kthread
0x8013fc84 (+0x110) kthread
0x80107d70 (+0x14) ret_from_fork
(File/line)
/usr/src/kernel/arch/arm/include/asm/thread_info.h:94 (discriminator 1)
/usr/src/kernel/arch/arm/include/asm/thread_info.h:94
/usr/src/kernel/kernel/rcu/tree.c:2227 (discriminator 13)
/usr/src/kernel/kernel/kthread.c:211
/usr/src/kernel/arch/arm/kernel/entry-common.S:119
Here is the actual log:
May 31 18:13:06 kernel: INFO: rcu_preempt detected stalls on CPUs/tasks:
May 31 18:13:06 kernel: (detected by 0, t=21002 jiffies, g=149858, c=149857, q=596)
May 31 18:13:06 kernel: All QSes seen, last rcu_preempt kthread activity 21002 (677469-656467), jiffies_till_next_fqs=3, root ->qsmask 0x0
May 31 18:13:06 kernel: swapper/0 R running task 0 0 0 0x00000000
May 31 18:13:06 kernel: Backtrace:
May 31 18:13:06 kernel: Function entered at [<8010ba88>] from [<8010bd48>]
May 31 18:13:06 kernel: r7:4e85a000 r6:8084a540 r5:00000000 r4:80907e00
May 31 18:13:06 kernel: Function entered at [<8010bd30>] from [<80149f3c>]
May 31 18:13:06 kernel: Function entered at [<80149e84>] from [<80175d1c>]
May 31 18:13:06 kernel: r5:8090f100 r4:cf0a4540
May 31 18:13:06 kernel: Function entered at [<80175384>] from [<80179148>]
May 31 18:13:06 kernel: r10:00000000 r9:8018a66c r8:cf0a2780 r7:000000e3 r6:00000000 r5:80907e00
May 31 18:13:06 kernel: r4:ffffe000
May 31 18:13:06 kernel: Function entered at [<8017910c>] from [<8018a668>]
May 31 18:13:06 kernel: r7:000000e3 r6:95b16cfa r5:80901ed0 r4:cf0a2998
May 31 18:13:06 kernel: Function entered at [<8018a618>] from [<8018a6bc>]
May 31 18:13:06 kernel: Function entered at [<8018a66c>] from [<8017a1b0>]
May 31 18:13:06 kernel: r7:000000e3 r6:95b16a60 r5:cf0a2998 r4:cf0a2740
May 31 18:13:06 kernel: Function entered at [<8017a0e8>] from [<8017a3e8>]
May 31 18:13:06 kernel: r10:cf0a27b8 r9:cf0a27d8 r8:cf0a2754 r7:cf0a27f8 r6:ffffffff r5:00000003
May 31 18:13:06 kernel: r4:cf0a2740
May 31 18:13:06 kernel: Function entered at [<8017a33c>] from [<80188904>]
May 31 18:13:06 kernel: r10:00000000 r9:f4a01100 r8:80901ed0 r7:f4a00100 r6:00000000 r5:00000000
May 31 18:13:06 kernel: r4:80848f68
May 31 18:13:06 kernel: Function entered at [<801888d0>] from [<8010e15c>]
May 31 18:13:06 kernel: Function entered at [<8010e06c>] from [<801014a0>]
May 31 18:13:06 kernel: r6:f4a0010c r5:80919000 r4:809043e0
May 31 18:13:06 kernel: Function entered at [<80101410>] from [<8010c74c>]
May 31 18:13:06 kernel: Exception stack(0x80901ed0 to 0x80901f18)
May 31 18:13:06 kernel: 1ec0: 00000000 8099ad30 00000001 80900000
May 31 18:13:06 kernel: 1ee0: 95c85feb 000000e3 cf0a3140 00000001 95b955a8 000000e3 00000000 80901f54
May 31 18:13:06 kernel: 1f00: 80901ee0 80901f20 80188c54 804a5030 200c0013 ffffffff
May 31 18:13:06 kernel: r9:80900000 r8:95b955a8 r7:80901f04 r6:ffffffff r5:200c0013 r4:804a5030
May 31 18:13:06 kernel: Function entered at [<804a4ed8>] from [<804a51d8>]
May 31 18:13:06 kernel: r10:80904144 r9:80909b40 r8:cf0a3140 r7:8090413c r6:00000001 r5:809040ec
May 31 18:13:06 kernel: r4:ffffe000
May 31 18:13:06 kernel: Function entered at [<804a51bc>] from [<8015e684>]
May 31 18:13:06 kernel: Function entered at [<8015e65c>] from [<8015e8cc>]
May 31 18:13:06 kernel: Function entered at [<8015e78c>] from [<80651c48>]
May 31 18:13:06 kernel: r7:ffffffff
May 31 18:13:06 kernel: Function entered at [<80651bbc>] from [<80800c54>]
May 31 18:13:06 kernel: r5:80954000 r4:80954050
May 31 18:13:06 kernel: Function entered at [<80800904>] from [<1000807c>]
May 31 18:13:06 kernel: rcu_preempt kthread starved for 21002 jiffies! g149858 c149857 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x100
May 31 18:13:06 kernel: rcu_preempt W 0 7 2 0x00000000
May 31 18:13:06 kernel: Backtrace:
May 31 18:13:06 kernel: Function entered at [<80653574>] from [<80653a48>]
May 31 18:13:06 kernel: r10:8090f29e r9:00000001 r8:80902d00 r7:cf0ae440 r6:cf0ae440 r5:cc0a1ed0
May 31 18:13:06 kernel: r4:ffffe000
May 31 18:13:06 kernel: Function entered at [<806539f0>] from [<80656a84>]
May 31 18:13:06 kernel: r5:cc0a1ed0 r4:000a0456
May 31 18:13:06 kernel: Function entered at [<8065692c>] from [<80174e68>]
May 31 18:13:06 kernel: r8:8090f290 r7:80902d00 r6:8090f29c r5:8090f100 r4:00000000
May 31 18:13:06 kernel: Function entered at [<80174950>] from [<8013fc84>]
May 31 18:13:06 kernel: r7:8090f100
May 31 18:13:06 kernel: Function entered at [<8013fb74>] from [<80107d70>]
May 31 18:13:06 kernel: r8:00000000 r7:00000000 r6:00000000 r5:8013fb74 r4:cc0402c0
Hi Igor,
Applying MX_SMP ERRATA, including 751472 did not seem to make a difference. Because this is very rare, perf_fuzzer is used to expedite a similar reproduction of the problem. I can still reproduce, and the events leading up to the stall are usually:
- Unhandled arm-pmu interrupt (IRQ 24) which is subsequently disabled.
- Then, one of the cores seems to go out to lunch. It completely stops taking interrupts, and eventually there is an RCU stall.
Jun 20 10:46:49 kernel: irq 24: nobody cared (try booting with the "irqpoll" option)
Jun 20 10:46:49 kernel: CPU: 0 PID: 750 Comm: perf_fuzzer Tainted: G O 4.9.88-1.0.0+6507266728-r1-P7A9 #1
Jun 20 10:46:49 kernel: Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Jun 20 10:46:49 kernel: Backtrace:
Callstack for unhandled interrupt:
0x8010bd48 (+0x18) show_stack
0x80342d2c (+0x90) dump_stack
0x8016b030 (+0x34) __report_bad_irq
0x8016b3d8 (+0x270) note_interrupt
0x80168a60 (+0x54) handle_irq_event_percpu
0x80168aac (+0x40) handle_irq_event
0x8016bd24 (+0xc4) handle_fasteoi_irq
0x80167c2c (+0x2c) generic_handle_irq
0x801681a0 (+0x64) __handle_domain_irq
0x80101460 (+0x50) gic_handle_irq
0x8010c74c (+0x6c) __irq_svc
0x8018ee94 (+0x114) smp_call_function_single
0x8019f9b4 (+0x44) task_function_call
0x8019fa4c (+0x80) event_function_call
0x8019fb08 (+0x48) _perf_event_disable
0x8019f68c (+0x28) perf_event_for_each_child
0x801a612c (+0x74) perf_event_task_disable
0x80136fc0 (+0x314) sys_prctl
0x80107ca0 (+0x0) ret_fast_syscall
After this, CPU3 seems wedged, and is not taking ANY interrupts.
Thoughts?
And as an additional note, I can reproduce the rcu stall more consistently when disabling cpuidle state1 on all cores (eliminating timer IPI's). This results in a somewhat consistent rcu stall, which seems to be reported elsewhere.
0x8010bd48 (+0x18) show_stack
0x80149f3c (+0xb8) sched_show_task
0x80175db8 (+0x998) rcu_check_callbacks
0x801791e8 (+0x3c) update_process_times
0x8018a708 (+0x50) tick_sched_handle
0x8018a75c (+0x50) tick_sched_timer
0x8017a250 (+0xc8) __hrtimer_run_queues
0x8017a488 (+0xac) hrtimer_interrupt
0x801886e8 (+0x1a0) tick_handle_oneshot_broadcast
0x80534664 (+0x3c) mxc_timer_interrupt
0x8016897c (+0x44) __handle_irq_event_percpu
0x80168a30 (+0x24) handle_irq_event_percpu
0x80168aac (+0x40) handle_irq_event
0x8016bd24 (+0xc4) handle_fasteoi_irq
0x80167c2c (+0x2c) generic_handle_irq
0x801681a0 (+0x64) __handle_domain_irq
0x80101460 (+0x50) gic_handle_irq
0x8010c74c (+0x6c) __irq_svc
0x804a4f58 (+0x1c) cpuidle_enter
0x8015e684 (+0x28) call_cpuidle
0x8010dde4 (+0x168) secondary_start_kernel
Hi Todd
could you try to reproduce issue on NXP i.MX6Q Sabre SD reference board with
Demo Image from
Best regards
igor
I don't have one readily available. However, I was able to attach a jtag probe, and found two CPU's in WFI (normal), but one stuck here:
(Any thoughts about this?)
#0 csd_lock_wait (csd=<optimized out>) at /usr/src/kernel/kernel/smp.c:96
#1 smp_call_function_many (mask=<optimized out>, func=<optimized out>, info=0xcc2b1dbc, wait=true)
at /usr/src/kernel/kernel/smp.c:452
#2 0x8018da68 in smp_call_function (func=<optimized out>, info=<optimized out>, wait=<optimized out>)
at /usr/src/kernel/kernel/smp.c:476
#3 0x8018dad0 in on_each_cpu (func=0x8010eb64 <twd_update_frequency>, info=0xcc2b1dbc, wait=<optimized out>)
at /usr/src/kernel/kernel/smp.c:583
#4 0x8010eeb4 in twd_rate_change (nb=<optimized out>, flags=<optimized out>, data=<optimized out>)
at /usr/src/kernel/arch/arm/kernel/smp_twd.c:127
#5 0x8013f6d8 in notifier_call_chain (nl=<optimized out>, val=3, v=0x3, nr_to_call=-821240512, nr_calls=0x0)
at /usr/src/kernel/kernel/notifier.c:93
#6 0x8013fafc in __srcu_notifier_call_chain (nh=0xcc026904, val=2, v=0xcc2b1db4, nr_to_call=-1, nr_calls=0x0)
at /usr/src/kernel/kernel/notifier.c:498
#7 0x8013fb38 in srcu_notifier_call_chain (nh=<optimized out>, val=<optimized out>, v=<optimized out>)
at /usr/src/kernel/kernel/notifier.c:507
#8 0x80398ca8 in __clk_notify (core=0xcc010080, msg=3, old_rate=<optimized out>, new_rate=<optimized out>)
at /usr/src/kernel/drivers/clk/clk.c:967
#9 0x80398d60 in __clk_recalc_rates (core=0xcc010080, msg=2) at /usr/src/kernel/drivers/clk/clk.c:1076
#10 0x80398d34 in __clk_recalc_rates (core=<optimized out>, msg=2) at /usr/src/kernel/drivers/clk/clk.c:1079
#11 0x80398d34 in __clk_recalc_rates (core=<optimized out>, msg=2) at /usr/src/kernel/drivers/clk/clk.c:1079
#12 0x8039b3a8 in clk_core_set_parent (core=0xcc010480, parent=0xcc010400) at /usr/src/kernel/drivers/clk/clk.c:1842
#13 0x8039b44c in clk_core_set_parent (parent=<optimized out>, core=<optimized out>)
at /usr/src/kernel/drivers/clk/clk.c:1794
#14 clk_set_parent (clk=<optimized out>, parent=<optimized out>) at /usr/src/kernel/drivers/clk/clk.c:1874
#15 0x804a1f7c in imx6q_set_target (policy=<optimized out>, index=1)
at /usr/src/kernel/drivers/cpufreq/imx6q-cpufreq.c:171
#16 0x8049a8d4 in __target_index (index=<optimized out>, policy=<optimized out>)
at /usr/src/kernel/drivers/cpufreq/cpufreq.c:1887
#17 __cpufreq_driver_target (policy=0xcc907e00, target_freq=<optimized out>, relation=<optimized out>)
at /usr/src/kernel/drivers/cpufreq/cpufreq.c:1948
#18 0x804a1040 in cpufreq_interactive_adjust_cpu (cpu=<optimized out>, policy=<optimized out>)
at /usr/src/kernel/drivers/cpufreq/cpufreq_interactive.c:509
#19 cpufreq_interactive_speedchange_task (data=<optimized out>)
at /usr/src/kernel/drivers/cpufreq/cpufreq_interactive.c:552
#20 0x8013e424 in kthread (_create=0xcc28f300) at /usr/src/kernel/kernel/kthread.c:211
#21 0x80107d70 in ret_from_fork () at /usr/src/kernel/arch/arm/kernel/entry-common.S:118
I think we are running into a similar issue: intermittent CPU stall with a similar backtrace:
[307362.408117] INFO: rcu_preempt detected stalls on CPUs/tasks:
[307362.412514] (detected by 1, t=2102 jiffies, g=12684671, c=12684670, q=711)
[307362.418223] All QSes seen, last rcu_preempt kthread activity 2101 (30706237-30704136), jiffies_till_next_fqs=1, root ->qsmask 0x0
[307362.428582] cfinteractive R running 0 38 2 0x00000000
[307362.428609] Backtrace:
[307362.428661] [<8010c380>] (dump_backtrace) from [<8010c5fc>] (show_stack+0x20/0x24)
[307362.428675] r7:80e02658 r6:80e02100 r5:00000002 r4:a8185400
[307362.428730] [<8010c5dc>] (show_stack) from [<8015ed04>] (sched_show_task+0xc4/0x118)
[307362.428761] [<8015ec40>] (sched_show_task) from [<8018bf08>] (rcu_check_callbacks+0xa64/0xa70)
[307362.428772] r5:2a976000 r4:80d9cf80
[307362.428816] [<8018b4a4>] (rcu_check_callbacks) from [<801905c4>] (update_process_times+0x4c/0x74)
[307362.428828] r10:ab70f4d8 r9:801a3874 r8:a8325c40 r7:0001178b r6:00000000 r5:a8185400
[307362.428875] r4:ffffe000
[307362.428908] [<80190578>] (update_process_times) from [<801a3870>] (tick_sched_handle+0x58/0x5c)
[307362.428919] r7:0001178b r6:61225002 r5:ab70f650 r4:80e02548
[307362.428967] [<801a3818>] (tick_sched_handle) from [<801a38e8>] (tick_sched_timer+0x74/0xc8)
[307362.428993] [<801a3874>] (tick_sched_timer) from [<80191784>] (__run_hrtimer+0x80/0x284)
[307362.429003] r8:a8325b50 r7:ab70f3c0 r6:ab70f3f8 r5:0001178b r4:ab70f650
[307362.429063] [<80191704>] (__run_hrtimer) from [<80191d94>] (hrtimer_interrupt+0x138/0x344)
[307362.429073] r9:00000001 r8:ab70f3c0 r7:00000000 r6:ab70f3f8 r5:0001178b r4:6122497f
[307362.429139] [<80191c5c>] (hrtimer_interrupt) from [<80110d24>] (twd_handler+0x40/0x50)
[307362.429150] r10:80e54000 r9:a8022f00 r8:00000001 r7:00000010 r6:a808b800 r5:ab715580
[307362.429196] r4:00000001
[307362.429231] [<80110ce4>] (twd_handler) from [<801819bc>] (handle_percpu_devid_irq+0xac/0x1d0)
[307362.429242] r5:ab715580 r4:00000010
[307362.429278] [<80181910>] (handle_percpu_devid_irq) from [<8017d024>] (generic_handle_irq+0x3c/0x4c)
[307362.429289] r10:a8325c40 r9:a8034000 r8:00000001 r7:00000000 r6:00000010 r5:00000000
[307362.429334] r4:00000010 r3:80181910
[307362.429368] [<8017cfe8>] (generic_handle_irq) from [<8017d350>] (__handle_domain_irq+0x8c/0xfc)
[307362.429379] r5:00000000 r4:80d9ac84
[307362.429413] [<8017d2c4>] (__handle_domain_irq) from [<80101560>] (gic_handle_irq+0x34/0x6c)
[307362.429424] r10:00000004 r9:a8325d90 r8:00000001 r7:f4a00100 r6:a8325c40 r5:80e03194
[307362.429469] r4:f4a0010c r3:a8325c40
[307362.429502] [<8010152c>] (gic_handle_irq) from [<8010d240>] (__irq_svc+0x40/0x74)
[307362.429515] Exception stack(0xa8325c40 to 0xa8325c88)
[307362.429538] 5c40: 00000003 ab72f934 00000003 00000003 80e02690 ab713200 80e0300c ab713204
[307362.429558] 5c60: 00000001 a8325d90 00000004 a8325cbc 00000003 a8325c88 801a889c 801a88cc
[307362.429571] 5c80: 20070013 ffffffff
[307362.429581] r7:a8325c74 r6:ffffffff r5:20070013 r4:801a88cc
[307362.429635] [<801a865c>] (smp_call_function_many) from [<801a8968>] (smp_call_function+0x48/0x88)
[307362.429646] r10:ffffffff r9:a8325d88 r8:00000000 r7:00000000 r6:00000001 r5:a8325d90
[307362.429692] r4:801109d0
[307362.429720] [<801a8920>] (smp_call_function) from [<801a89e0>] (on_each_cpu+0x38/0x90)
[307362.429730] r7:00000000 r6:00000001 r5:a8325d90 r4:801109d0
[307362.429782] [<801a89a8>] (on_each_cpu) from [<80110d6c>] (twd_rate_change+0x38/0x40)
[307362.429793] r7:00000000 r6:00000002 r5:a8325d88 r4:ffffffff
[307362.429846] [<80110d34>] (twd_rate_change) from [<801533e8>] (notifier_call_chain+0x54/0x94)
[307362.429870] [<80153394>] (notifier_call_chain) from [<801538b8>] (__srcu_notifier_call_chain+0x54/0x70)
[307362.429880] r9:a8325d88 r8:00000002 r7:00000000 r6:00000000 r5:a80b2544 r4:a80b255c
[307362.429939] [<80153864>] (__srcu_notifier_call_chain) from [<801538fc>] (srcu_notifier_call_chain+0x28/0x30)
[307362.429949] r10:00000000 r9:80e9db84 r8:00000002 r7:80e02548 r6:a803ac00 r5:80e82cd0
[307362.429995] r4:a80b2540
[307362.430027] [<801538d4>] (srcu_notifier_call_chain) from [<8074a3b4>] (__clk_notify+0xa0/0xa8)
[307362.430049] [<8074a314>] (__clk_notify) from [<8074a464>] (__clk_recalc_rates+0xa8/0xac)
[307362.430059] r8:a8038b80 r7:00000001 r6:179a7b00 r5:00000002 r4:a803ac00
[307362.430113] [<8074a3bc>] (__clk_recalc_rates) from [<8074a438>] (__clk_recalc_rates+0x7c/0xac)
[307362.430123] r7:00000001 r6:2f34f600 r5:00000002 r4:a803ac00
[307362.430171] [<8074a3bc>] (__clk_recalc_rates) from [<8074a438>] (__clk_recalc_rates+0x7c/0xac)
[307362.430181] r7:00000001 r6:2f34f600 r5:00000002 r4:a803d680
[307362.430231] [<8074a3bc>] (__clk_recalc_rates) from [<8074cdf0>] (clk_core_set_parent+0x1bc/0x2dc)
[307362.430242] r7:00000001 r6:00000000 r5:a803af80 r4:a803d000
[307362.430290] [<8074cc34>] (clk_core_set_parent) from [<8074cf3c>] (clk_set_parent+0x2c/0x30)
[307362.430300] r9:00000000 r8:179a7b00 r7:80e02548 r6:000c15c0 r5:00060ae0 r4:80f02f14
[307362.430366] [<8074cf10>] (clk_set_parent) from [<806d2b70>] (imx6q_set_target+0x490/0x544)
[307362.430395] [<806d26e0>] (imx6q_set_target) from [<806c8984>] (__cpufreq_driver_target+0x184/0x2b0)
[307362.430406] r10:00000000 r9:00000000 r8:80f02e4c r7:00000000 r6:80e02548 r5:a8587c00
[307362.430450] r4:00000000
[307362.430481] [<806c8800>] (__cpufreq_driver_target) from [<806d1694>] (cpufreq_interactive_speedchange_task+0x264/0x354)
[307362.430492] r10:80d9be90 r9:80e02690 r8:80e0300c r7:00000047 r6:000c15c0 r5:80d9be90
[307362.430537] r4:ab71ee90
[307362.430574] [<806d1430>] (cpufreq_interactive_speedchange_task) from [<8015250c>] (kthread+0xfc/0x114)
[307362.430585] r10:00000000 r9:00000000 r8:00000000 r7:806d1430 r6:00000000 r5:a830e640
[307362.430629] r4:00000000
[307362.430661] [<80152410>] (kthread) from [<80108028>] (ret_from_fork+0x14/0x2c)
[307362.430671] r7:00000000 r6:00000000 r5:80152410 r4:a830e640
[307362.430710] rcu_preempt kthread starved for 2101 jiffies!
Did you ever resolve this?
-> Did you ever resolve this?
It appears that applying the errata within u-boot code did resolve the issue. Later versions of u-boot separate SMP and non-SPM options, and it is easy to not apply the correct errata (which must be applied in u-boot).
CONFIG_ARM_ERRATA_751472
CONFIG_ARM_ERRATA_761320
CONFIG_ARM_ERRATA_794072
CONFIG_ARM_ERRATA_854369
~
HI Todd,
Could you please share more information about, what's the fix done for this issue .
We are using 5.4.70 and I checked my u-boot it has all the ARM errata related fixes, but we still see random RCU stalls the pattern look same as your.
could you please share more details , highly appreciate your Help.
Thanks and Regards
Terry
Thanks, appreciate it!
Hi Jordans,
Issue is is resolved ?
BR,
Mk
reason may be that not all arm errata mentioned above were implemented or correctly integrated.
Other reason may be in gpu driver, one can test without gpu (and update to latest nxp linux if this is the cause).
Best regards
igor
Hi Todd
what bsp used in the case, usually such errors happen due to arm errata, so
one can check if there are #define CONFIG_ARM_ERRATA:
mx6_common.h\configs\include - uboot-imx - i.MX U-Boot
recommended to try nxp linux from Code Aurora git repositories
Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------