i.MX6Q rcu_preempt cpu stall

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

i.MX6Q rcu_preempt cpu stall

2,015 Views
jordans
Contributor I

We are dealing with an intermittent CPU stall which eventually locks/hangs the system.

Here is the relevant backtrace:

[307362.408117] INFO: rcu_preempt detected stalls on CPUs/tasks:
[307362.412514]     (detected by 1, t=2102 jiffies, g=12684671, c=12684670, q=711)
[307362.418223] All QSes seen, last rcu_preempt kthread activity 2101 (30706237-30704136), jiffies_till_next_fqs=1, root ->qsmask 0x0
[307362.428582] cfinteractive   R running      0    38      2 0x00000000
[307362.428609] Backtrace:
[307362.428661] [<8010c380>] (dump_backtrace) from [<8010c5fc>] (show_stack+0x20/0x24)
[307362.428675]  r7:80e02658 r6:80e02100 r5:00000002 r4:a8185400
[307362.428730] [<8010c5dc>] (show_stack) from [<8015ed04>] (sched_show_task+0xc4/0x118)
[307362.428761] [<8015ec40>] (sched_show_task) from [<8018bf08>] (rcu_check_callbacks+0xa64/0xa70)
[307362.428772]  r5:2a976000 r4:80d9cf80
[307362.428816] [<8018b4a4>] (rcu_check_callbacks) from [<801905c4>] (update_process_times+0x4c/0x74)
[307362.428828]  r10:ab70f4d8 r9:801a3874 r8:a8325c40 r7:0001178b r6:00000000 r5:a8185400
[307362.428875]  r4:ffffe000
[307362.428908] [<80190578>] (update_process_times) from [<801a3870>] (tick_sched_handle+0x58/0x5c)
[307362.428919]  r7:0001178b r6:61225002 r5:ab70f650 r4:80e02548
[307362.428967] [<801a3818>] (tick_sched_handle) from [<801a38e8>] (tick_sched_timer+0x74/0xc8)
[307362.428993] [<801a3874>] (tick_sched_timer) from [<80191784>] (__run_hrtimer+0x80/0x284)
[307362.429003]  r8:a8325b50 r7:ab70f3c0 r6:ab70f3f8 r5:0001178b r4:ab70f650
[307362.429063] [<80191704>] (__run_hrtimer) from [<80191d94>] (hrtimer_interrupt+0x138/0x344)
[307362.429073]  r9:00000001 r8:ab70f3c0 r7:00000000 r6:ab70f3f8 r5:0001178b r4:6122497f
[307362.429139] [<80191c5c>] (hrtimer_interrupt) from [<80110d24>] (twd_handler+0x40/0x50)
[307362.429150]  r10:80e54000 r9:a8022f00 r8:00000001 r7:00000010 r6:a808b800 r5:ab715580
[307362.429196]  r4:00000001
[307362.429231] [<80110ce4>] (twd_handler) from [<801819bc>] (handle_percpu_devid_irq+0xac/0x1d0)
[307362.429242]  r5:ab715580 r4:00000010
[307362.429278] [<80181910>] (handle_percpu_devid_irq) from [<8017d024>] (generic_handle_irq+0x3c/0x4c)
[307362.429289]  r10:a8325c40 r9:a8034000 r8:00000001 r7:00000000 r6:00000010 r5:00000000
[307362.429334]  r4:00000010 r3:80181910
[307362.429368] [<8017cfe8>] (generic_handle_irq) from [<8017d350>] (__handle_domain_irq+0x8c/0xfc)
[307362.429379]  r5:00000000 r4:80d9ac84
[307362.429413] [<8017d2c4>] (__handle_domain_irq) from [<80101560>] (gic_handle_irq+0x34/0x6c)
[307362.429424]  r10:00000004 r9:a8325d90 r8:00000001 r7:f4a00100 r6:a8325c40 r5:80e03194
[307362.429469]  r4:f4a0010c r3:a8325c40
[307362.429502] [<8010152c>] (gic_handle_irq) from [<8010d240>] (__irq_svc+0x40/0x74)
[307362.429515] Exception stack(0xa8325c40 to 0xa8325c88)
[307362.429538] 5c40: 00000003 ab72f934 00000003 00000003 80e02690 ab713200 80e0300c ab713204
[307362.429558] 5c60: 00000001 a8325d90 00000004 a8325cbc 00000003 a8325c88 801a889c 801a88cc
[307362.429571] 5c80: 20070013 ffffffff
[307362.429581]  r7:a8325c74 r6:ffffffff r5:20070013 r4:801a88cc
[307362.429635] [<801a865c>] (smp_call_function_many) from [<801a8968>] (smp_call_function+0x48/0x88)
[307362.429646]  r10:ffffffff r9:a8325d88 r8:00000000 r7:00000000 r6:00000001 r5:a8325d90
[307362.429692]  r4:801109d0
[307362.429720] [<801a8920>] (smp_call_function) from [<801a89e0>] (on_each_cpu+0x38/0x90)
[307362.429730]  r7:00000000 r6:00000001 r5:a8325d90 r4:801109d0
[307362.429782] [<801a89a8>] (on_each_cpu) from [<80110d6c>] (twd_rate_change+0x38/0x40)
[307362.429793]  r7:00000000 r6:00000002 r5:a8325d88 r4:ffffffff
[307362.429846] [<80110d34>] (twd_rate_change) from [<801533e8>] (notifier_call_chain+0x54/0x94)
[307362.429870] [<80153394>] (notifier_call_chain) from [<801538b8>] (__srcu_notifier_call_chain+0x54/0x70)
[307362.429880]  r9:a8325d88 r8:00000002 r7:00000000 r6:00000000 r5:a80b2544 r4:a80b255c
[307362.429939] [<80153864>] (__srcu_notifier_call_chain) from [<801538fc>] (srcu_notifier_call_chain+0x28/0x30)
[307362.429949]  r10:00000000 r9:80e9db84 r8:00000002 r7:80e02548 r6:a803ac00 r5:80e82cd0
[307362.429995]  r4:a80b2540
[307362.430027] [<801538d4>] (srcu_notifier_call_chain) from [<8074a3b4>] (__clk_notify+0xa0/0xa8)
[307362.430049] [<8074a314>] (__clk_notify) from [<8074a464>] (__clk_recalc_rates+0xa8/0xac)
[307362.430059]  r8:a8038b80 r7:00000001 r6:179a7b00 r5:00000002 r4:a803ac00
[307362.430113] [<8074a3bc>] (__clk_recalc_rates) from [<8074a438>] (__clk_recalc_rates+0x7c/0xac)
[307362.430123]  r7:00000001 r6:2f34f600 r5:00000002 r4:a803ac00
[307362.430171] [<8074a3bc>] (__clk_recalc_rates) from [<8074a438>] (__clk_recalc_rates+0x7c/0xac)
[307362.430181]  r7:00000001 r6:2f34f600 r5:00000002 r4:a803d680
[307362.430231] [<8074a3bc>] (__clk_recalc_rates) from [<8074cdf0>] (clk_core_set_parent+0x1bc/0x2dc)
[307362.430242]  r7:00000001 r6:00000000 r5:a803af80 r4:a803d000
[307362.430290] [<8074cc34>] (clk_core_set_parent) from [<8074cf3c>] (clk_set_parent+0x2c/0x30)
[307362.430300]  r9:00000000 r8:179a7b00 r7:80e02548 r6:000c15c0 r5:00060ae0 r4:80f02f14
[307362.430366] [<8074cf10>] (clk_set_parent) from [<806d2b70>] (imx6q_set_target+0x490/0x544)
[307362.430395] [<806d26e0>] (imx6q_set_target) from [<806c8984>] (__cpufreq_driver_target+0x184/0x2b0)
[307362.430406]  r10:00000000 r9:00000000 r8:80f02e4c r7:00000000 r6:80e02548 r5:a8587c00
[307362.430450]  r4:00000000
[307362.430481] [<806c8800>] (__cpufreq_driver_target) from [<806d1694>] (cpufreq_interactive_speedchange_task+0x264/0x354)
[307362.430492]  r10:80d9be90 r9:80e02690 r8:80e0300c r7:00000047 r6:000c15c0 r5:80d9be90
[307362.430537]  r4:ab71ee90
[307362.430574] [<806d1430>] (cpufreq_interactive_speedchange_task) from [<8015250c>] (kthread+0xfc/0x114)
[307362.430585]  r10:00000000 r9:00000000 r8:00000000 r7:806d1430 r6:00000000 r5:a830e640
[307362.430629]  r4:00000000
[307362.430661] [<80152410>] (kthread) from [<80108028>] (ret_from_fork+0x14/0x2c)
[307362.430671]  r7:00000000 r6:00000000 r5:80152410 r4:a830e640
[307362.430710] rcu_preempt kthread starved for 2101 jiffies!

This continues for about half an hour before the backtrace starts aborting due to a bad frame pointer.  We are using Pyro @ 4.1.15-2.0.0.

Here are the only relevant kernel configurations I can find:

CONFIG_PREEMPT_VOLUNTARY=y

# CONFIG_RCU_CPU_STALL_INFO is not set

This seems to be a very similar/identical stack trace to an issue listed here: https://community.nxp.com/thread/505707 

Any thoughts on what to tackle first?

Labels (1)
0 Kudos
3 Replies

1,594 Views
igorpadykov
NXP Employee
NXP Employee

Hi Jordan

one can check if arm errata fixes are included (#define CONFIG_ARM_ERRATA_)

mx6_common.h\configs\include - uboot-imx - i.MX U-Boot 

and try with nxp linux from source.codeaurora.org/external/imx/linux-imx repository

L4.1.15_2.0.0 (Krogoth)

linux-imx - i.MX Linux kernel 

Documentation

i.MX Software | NXP 

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

1,594 Views
jordans
Contributor I

Hi Igor,

Thanks for your reply.  We are using a Nitrogen6x and Boundary's u-boot 2017.07 branch -- which does not include the errata defines.  I checked in the arch/arm/cpu/armv7/start.S and the actual errata fixes seem to be present so I am attempting just to patch in the correct errata defines and updated defconfig.

However when trying to build I get an implicit declaration error in the kernel-module-imx-gpu-viv-src:

gc_hal_kernel_platform_imx6q14.c:570:12: error: implicit declaration of function 'devm_reset_control_get' [-Werror=implicit-function-declaration]
      rstc = devm_reset_control_get(pdev, "gpu3d");

                 ^

Is there some dependency that is introduced by these errata fixes?

This is my patch:

---
include/configs/mx6_common.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/include/configs/mx6_common.h b/include/configs/mx6_common.h
index 1a8ab4ee33..abfcb36f6f 100644
--- a/include/configs/mx6_common.h
+++ b/include/configs/mx6_common.h
@@ -8,6 +8,17 @@
#define __MX6_COMMON_H
#ifndef CONFIG_MX6UL
+#define CONFIG_ARM_ERRATA_743622
+#if (defined(CONFIG_MX6QP) || defined(CONFIG_MX6Q) || defined(CONFIG_MX6DL)) && !defined(CONFIG_MX6S)
+#define CONFIG_ARM_ERRATA_751472
+#define CONFIG_ARM_ERRATA_794072
+#define CONFIG_ARM_ERRATA_761320
+#define CONFIG_ARM_ERRATA_845369
+#endif
+
+
+
+
#ifndef CONFIG_SYS_L2CACHE_OFF
#define CONFIG_SYS_L2_PL310
#define CONFIG_SYS_PL310_BASE  L2_PL310_BASE
--
2.23.0

And this is the commit we are at for u-boot-boundary:

GitHub - boundarydevices/u-boot-imx6 at 146b49876ed5f02c8a595a7198579ec0d1a8455a 

Thanks,

Jordan

0 Kudos

1,594 Views
igorpadykov
NXP Employee
NXP Employee

Hi Jordan

nxp supports only own linux/uboot releases located on source.codeaurora.org/external/imx

repository: linux-imx - i.MX Linux kernel 

Issue with boundary devices linux can be posted on https://boundarydevices.com/wiki-bd/ 

Best regards
igor

0 Kudos