We are dealing with an intermittent CPU stall which eventually locks/hangs the system.
Here is the relevant backtrace:
[307362.408117] INFO: rcu_preempt detected stalls on CPUs/tasks:
[307362.412514] (detected by 1, t=2102 jiffies, g=12684671, c=12684670, q=711)
[307362.418223] All QSes seen, last rcu_preempt kthread activity 2101 (30706237-30704136), jiffies_till_next_fqs=1, root ->qsmask 0x0
[307362.428582] cfinteractive R running 0 38 2 0x00000000
[307362.428609] Backtrace:
[307362.428661] [<8010c380>] (dump_backtrace) from [<8010c5fc>] (show_stack+0x20/0x24)
[307362.428675] r7:80e02658 r6:80e02100 r5:00000002 r4:a8185400
[307362.428730] [<8010c5dc>] (show_stack) from [<8015ed04>] (sched_show_task+0xc4/0x118)
[307362.428761] [<8015ec40>] (sched_show_task) from [<8018bf08>] (rcu_check_callbacks+0xa64/0xa70)
[307362.428772] r5:2a976000 r4:80d9cf80
[307362.428816] [<8018b4a4>] (rcu_check_callbacks) from [<801905c4>] (update_process_times+0x4c/0x74)
[307362.428828] r10:ab70f4d8 r9:801a3874 r8:a8325c40 r7:0001178b r6:00000000 r5:a8185400
[307362.428875] r4:ffffe000
[307362.428908] [<80190578>] (update_process_times) from [<801a3870>] (tick_sched_handle+0x58/0x5c)
[307362.428919] r7:0001178b r6:61225002 r5:ab70f650 r4:80e02548
[307362.428967] [<801a3818>] (tick_sched_handle) from [<801a38e8>] (tick_sched_timer+0x74/0xc8)
[307362.428993] [<801a3874>] (tick_sched_timer) from [<80191784>] (__run_hrtimer+0x80/0x284)
[307362.429003] r8:a8325b50 r7:ab70f3c0 r6:ab70f3f8 r5:0001178b r4:ab70f650
[307362.429063] [<80191704>] (__run_hrtimer) from [<80191d94>] (hrtimer_interrupt+0x138/0x344)
[307362.429073] r9:00000001 r8:ab70f3c0 r7:00000000 r6:ab70f3f8 r5:0001178b r4:6122497f
[307362.429139] [<80191c5c>] (hrtimer_interrupt) from [<80110d24>] (twd_handler+0x40/0x50)
[307362.429150] r10:80e54000 r9:a8022f00 r8:00000001 r7:00000010 r6:a808b800 r5:ab715580
[307362.429196] r4:00000001
[307362.429231] [<80110ce4>] (twd_handler) from [<801819bc>] (handle_percpu_devid_irq+0xac/0x1d0)
[307362.429242] r5:ab715580 r4:00000010
[307362.429278] [<80181910>] (handle_percpu_devid_irq) from [<8017d024>] (generic_handle_irq+0x3c/0x4c)
[307362.429289] r10:a8325c40 r9:a8034000 r8:00000001 r7:00000000 r6:00000010 r5:00000000
[307362.429334] r4:00000010 r3:80181910
[307362.429368] [<8017cfe8>] (generic_handle_irq) from [<8017d350>] (__handle_domain_irq+0x8c/0xfc)
[307362.429379] r5:00000000 r4:80d9ac84
[307362.429413] [<8017d2c4>] (__handle_domain_irq) from [<80101560>] (gic_handle_irq+0x34/0x6c)
[307362.429424] r10:00000004 r9:a8325d90 r8:00000001 r7:f4a00100 r6:a8325c40 r5:80e03194
[307362.429469] r4:f4a0010c r3:a8325c40
[307362.429502] [<8010152c>] (gic_handle_irq) from [<8010d240>] (__irq_svc+0x40/0x74)
[307362.429515] Exception stack(0xa8325c40 to 0xa8325c88)
[307362.429538] 5c40: 00000003 ab72f934 00000003 00000003 80e02690 ab713200 80e0300c ab713204
[307362.429558] 5c60: 00000001 a8325d90 00000004 a8325cbc 00000003 a8325c88 801a889c 801a88cc
[307362.429571] 5c80: 20070013 ffffffff
[307362.429581] r7:a8325c74 r6:ffffffff r5:20070013 r4:801a88cc
[307362.429635] [<801a865c>] (smp_call_function_many) from [<801a8968>] (smp_call_function+0x48/0x88)
[307362.429646] r10:ffffffff r9:a8325d88 r8:00000000 r7:00000000 r6:00000001 r5:a8325d90
[307362.429692] r4:801109d0
[307362.429720] [<801a8920>] (smp_call_function) from [<801a89e0>] (on_each_cpu+0x38/0x90)
[307362.429730] r7:00000000 r6:00000001 r5:a8325d90 r4:801109d0
[307362.429782] [<801a89a8>] (on_each_cpu) from [<80110d6c>] (twd_rate_change+0x38/0x40)
[307362.429793] r7:00000000 r6:00000002 r5:a8325d88 r4:ffffffff
[307362.429846] [<80110d34>] (twd_rate_change) from [<801533e8>] (notifier_call_chain+0x54/0x94)
[307362.429870] [<80153394>] (notifier_call_chain) from [<801538b8>] (__srcu_notifier_call_chain+0x54/0x70)
[307362.429880] r9:a8325d88 r8:00000002 r7:00000000 r6:00000000 r5:a80b2544 r4:a80b255c
[307362.429939] [<80153864>] (__srcu_notifier_call_chain) from [<801538fc>] (srcu_notifier_call_chain+0x28/0x30)
[307362.429949] r10:00000000 r9:80e9db84 r8:00000002 r7:80e02548 r6:a803ac00 r5:80e82cd0
[307362.429995] r4:a80b2540
[307362.430027] [<801538d4>] (srcu_notifier_call_chain) from [<8074a3b4>] (__clk_notify+0xa0/0xa8)
[307362.430049] [<8074a314>] (__clk_notify) from [<8074a464>] (__clk_recalc_rates+0xa8/0xac)
[307362.430059] r8:a8038b80 r7:00000001 r6:179a7b00 r5:00000002 r4:a803ac00
[307362.430113] [<8074a3bc>] (__clk_recalc_rates) from [<8074a438>] (__clk_recalc_rates+0x7c/0xac)
[307362.430123] r7:00000001 r6:2f34f600 r5:00000002 r4:a803ac00
[307362.430171] [<8074a3bc>] (__clk_recalc_rates) from [<8074a438>] (__clk_recalc_rates+0x7c/0xac)
[307362.430181] r7:00000001 r6:2f34f600 r5:00000002 r4:a803d680
[307362.430231] [<8074a3bc>] (__clk_recalc_rates) from [<8074cdf0>] (clk_core_set_parent+0x1bc/0x2dc)
[307362.430242] r7:00000001 r6:00000000 r5:a803af80 r4:a803d000
[307362.430290] [<8074cc34>] (clk_core_set_parent) from [<8074cf3c>] (clk_set_parent+0x2c/0x30)
[307362.430300] r9:00000000 r8:179a7b00 r7:80e02548 r6:000c15c0 r5:00060ae0 r4:80f02f14
[307362.430366] [<8074cf10>] (clk_set_parent) from [<806d2b70>] (imx6q_set_target+0x490/0x544)
[307362.430395] [<806d26e0>] (imx6q_set_target) from [<806c8984>] (__cpufreq_driver_target+0x184/0x2b0)
[307362.430406] r10:00000000 r9:00000000 r8:80f02e4c r7:00000000 r6:80e02548 r5:a8587c00
[307362.430450] r4:00000000
[307362.430481] [<806c8800>] (__cpufreq_driver_target) from [<806d1694>] (cpufreq_interactive_speedchange_task+0x264/0x354)
[307362.430492] r10:80d9be90 r9:80e02690 r8:80e0300c r7:00000047 r6:000c15c0 r5:80d9be90
[307362.430537] r4:ab71ee90
[307362.430574] [<806d1430>] (cpufreq_interactive_speedchange_task) from [<8015250c>] (kthread+0xfc/0x114)
[307362.430585] r10:00000000 r9:00000000 r8:00000000 r7:806d1430 r6:00000000 r5:a830e640
[307362.430629] r4:00000000
[307362.430661] [<80152410>] (kthread) from [<80108028>] (ret_from_fork+0x14/0x2c)
[307362.430671] r7:00000000 r6:00000000 r5:80152410 r4:a830e640
[307362.430710] rcu_preempt kthread starved for 2101 jiffies!
This continues for about half an hour before the backtrace starts aborting due to a bad frame pointer. We are using Pyro @ 4.1.15-2.0.0.
Here are the only relevant kernel configurations I can find:
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_RCU_CPU_STALL_INFO is not set
This seems to be a very similar/identical stack trace to an issue listed here: https://community.nxp.com/thread/505707
Any thoughts on what to tackle first?
Hi Jordan
one can check if arm errata fixes are included (#define CONFIG_ARM_ERRATA_)
mx6_common.h\configs\include - uboot-imx - i.MX U-Boot
and try with nxp linux from source.codeaurora.org/external/imx/linux-imx repository
L4.1.15_2.0.0 (Krogoth)
Documentation
Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi Igor,
Thanks for your reply. We are using a Nitrogen6x and Boundary's u-boot 2017.07 branch -- which does not include the errata defines. I checked in the arch/arm/cpu/armv7/start.S and the actual errata fixes seem to be present so I am attempting just to patch in the correct errata defines and updated defconfig.
However when trying to build I get an implicit declaration error in the kernel-module-imx-gpu-viv-src:
gc_hal_kernel_platform_imx6q14.c:570:12: error: implicit declaration of function 'devm_reset_control_get' [-Werror=implicit-function-declaration]
rstc = devm_reset_control_get(pdev, "gpu3d");
^
Is there some dependency that is introduced by these errata fixes?
This is my patch:
---include/configs/mx6_common.h | 11 +++++++++++1 file changed, 11 insertions(+)diff --git a/include/configs/mx6_common.h b/include/configs/mx6_common.hindex 1a8ab4ee33..abfcb36f6f 100644--- a/include/configs/mx6_common.h+++ b/include/configs/mx6_common.h@@ -8,6 +8,17 @@#define __MX6_COMMON_H#ifndef CONFIG_MX6UL+#define CONFIG_ARM_ERRATA_743622+#if (defined(CONFIG_MX6QP) || defined(CONFIG_MX6Q) || defined(CONFIG_MX6DL)) && !defined(CONFIG_MX6S)+#define CONFIG_ARM_ERRATA_751472+#define CONFIG_ARM_ERRATA_794072+#define CONFIG_ARM_ERRATA_761320+#define CONFIG_ARM_ERRATA_845369+#endif++++#ifndef CONFIG_SYS_L2CACHE_OFF#define CONFIG_SYS_L2_PL310#define CONFIG_SYS_PL310_BASE L2_PL310_BASE--2.23.0
And this is the commit we are at for u-boot-boundary:
GitHub - boundarydevices/u-boot-imx6 at 146b49876ed5f02c8a595a7198579ec0d1a8455a
Thanks,
Jordan
Hi Jordan
nxp supports only own linux/uboot releases located on source.codeaurora.org/external/imx
repository: linux-imx - i.MX Linux kernel
Issue with boundary devices linux can be posted on https://boundarydevices.com/wiki-bd/
Best regards
igor