i.MX8MM Kernel crashes on cpuidle

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

i.MX8MM Kernel crashes on cpuidle

596 次查看
adv-johnchang
Contributor I

Hi,

We have a custom i.MX8MM board that crashes after running for some time. This issue occurs with both Linux kernel version 6.1.55 (Yocto 4.2) and Linux kernel version 5.15.71 (Yocto 4.0). The system's default CPU governor is ondemand. We've tested changing the CPU governor to performance, and the system no longer crashes.

Do you have any suggestions on how to determine if this is a hardware issue or a kernel issue? And how to solve this issue?

Here's the relevant log:

[32832.280148] audit: type=1327 audit(1751365801.520:22): proctitle=2F7573722F7362696E2F63726F6E64002D6E
[36432.304306] audit: type=1006 audit(1751369401.584:23): pid=3094 uid=0 subj=kernel old-auid=4294967295 auid=0 tty=(none) old-ses=4294967295 ses=15 res=1
[36432.317968] audit: type=1300 audit(1751369401.584:23): arch=c00000b7 syscall=64 success=yes exit=1 a0=3 a1=ffffd19f6b10 a2=1 a3=ffffa11e9020 items=0 ppid=265 pid=3094 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=15 comm="crond" exe="/usr/sbin/crond" subj=kernel key=(null)
[36432.344959] audit: type=1327 audit(1751369401.584:23): proctitle=2F7573722F7362696E2F63726F6E64002D6E
[37877.552323] rcu: INFO: rcu_preempt self-detected stall on CPU
[37877.558090] rcu:     2-...!: (1 ticks this GP) idle=d95/0/0x3 softirq=199976/199976 fqs=0
[37877.566100]  (t=5958 jiffies g=1820793 q=55)
[37877.570369] rcu: rcu_preempt kthread timer wakeup didn't happen for 5957 jiffies! g1820793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[37877.581933] rcu:     Possible timer handling issue on cpu=0 timer-softirq=157944
[37877.589067] rcu: rcu_preempt kthread starved for 5958 jiffies! g1820793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[37877.599676] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[37877.608805] rcu: RCU grace-period kthread stack dump:
[37877.613854] task:rcu_preempt     state:I stack:    0 pid:   13 ppid:     2 flags:0x00000008
[37877.622211] Call trace:
[37877.624655]  __switch_to+0x104/0x15c
[37877.628241]  __schedule+0x2b8/0x710
[37877.631733]  schedule+0x88/0x100
[37877.634962]  schedule_timeout+0x80/0xf0
[37877.638801]  rcu_gp_fqs_loop+0x118/0x2e0
[37877.642726]  rcu_gp_kthread+0x104/0x11c
[37877.646564]  kthread+0x150/0x160
[37877.649795]  ret_from_fork+0x10/0x20
[37877.653373] rcu: Stack dump where RCU GP kthread last ran:
[37877.658855] Task dump for CPU 0:
[37877.662081] task:swapper/0       state:R  running task     stack:    0 pid:    0 ppid:     0 flags:0x00000008
[37877.671999] Call trace:
[37877.674443]  __switch_to+0x104/0x15c
[37877.678021]  cpuidle_enter_state+0x25c/0x2f0
[37877.682296]  cpuidle_enter+0x38/0x50
[37877.685873]  do_idle+0x210/0x2a0
[37877.689103]  cpu_startup_entry+0x24/0x80
[37877.693027]  rest_init+0xe4/0xf4
[37877.696256]  arch_call_rest_init+0x10/0x1c
[37877.700356]  start_kernel+0x610/0x650
[37877.704020]  __primary_switched+0xbc/0xc4
[37877.708037] Task dump for CPU 0:
[37877.711264] task:swapper/0       state:R  running task     stack:    0 pid:    0 ppid:     0 flags:0x00000008
[37877.721181] Call trace:
[37877.723625]  __switch_to+0x104/0x15c
[37877.727203]  cpuidle_enter_state+0x25c/0x2f0
[37877.731475]  cpuidle_enter+0x38/0x50
[37877.735052]  do_idle+0x210/0x2a0
[37877.738281]  cpu_startup_entry+0x24/0x80
[37877.742205]  rest_init+0xe4/0xf4
[37877.745433]  arch_call_rest_init+0x10/0x1c
[37877.749531]  start_kernel+0x610/0x650
[37877.753194]  __primary_switched+0xbc/0xc4
[37877.757206] Task dump for CPU 2:
[37877.760431] task:swapper/2       state:R  running task     stack:    0 pid:    0 ppid:     1 flags:0x00000008
[37877.770349] Call trace:
[37877.772792]  dump_backtrace+0x0/0x19c
[37877.776457]  show_stack+0x18/0x70
[37877.779775]  sched_show_task+0x154/0x180
[37877.783701]  dump_cpu_task+0x44/0x58
[37877.787280]  rcu_dump_cpu_stacks+0xe8/0x12c

 

0 项奖励
回复
9 回复数

580 次查看
joanxie
NXP TechSupport
NXP TechSupport

 The system's default CPU governor is ondemand. We've tested changing the CPU governor to performance, and the system no longer crashes.

> is it your typo?when do you get the system crash? with ondemand or with performance?

do you mind giving me the result with command "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors" ?

0 项奖励
回复

574 次查看
adv-johnchang
Contributor I

Hi joanxie,

System crashed while CPU governor is ondemand mode.

Here is the scaling_available_governors result:

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
conservative ondemand userspace powersave performance schedutil

 

0 项奖励
回复

564 次查看
joanxie
NXP TechSupport
NXP TechSupport

what do you change in the kernel? the imx8mm default frequency is 1.2Ghz, so you couldn't boot up the board as default, right? what do you get "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies"? the uboot set the init clock under

/arch/arm/mach-imx/imx8m/clock_imx8mm.c

for the kernel, set the available cpu frequency in the dtsi file and the cpufreq driver is

https://github.com/nxp-imx/linux-imx/blob/lf-6.1.y/drivers/cpufreq/imx-cpufreq-dt.c

if your bsp is the similar as nxp, it seems the system crash under 1.2G, you can refer to the chapter 2.5.3 CPU Frequency Scaling (CPUFREQ) of enclosed file

 

0 项奖励
回复

555 次查看
adv-johnchang
Contributor I

Hi joanxie,

Thanks for your information.

Our kernel has only been modified for peripheral hardware; we haven't made any changes to the CPU frequency or other CPU-related settings. Many of our devices can boot and run normally. Currently, only one device will crash after running for a day.

Here is the scaling_available_frequencies result:

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
1200000 1600000
0 项奖励
回复

544 次查看
joanxie
NXP TechSupport
NXP TechSupport

if all of your boards have the same SW and HW, only one board has issue, you can do the AB test, to check if the chip issue, all of your boards get the same result when you use the command "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies"? and what is the frequency when you use ondemand and performance? I mean when you set ondemand or performance, use the command to check "cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq", are the ondemad and performance share the same cpu frequency? 

0 项奖励
回复

471 次查看
adv-johnchang
Contributor I

Hi joanxie,

All of your boards get the same result when you use the command "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies"?

=> Yes.

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
1200000 1600000

 

What is the frequency when you use ondemand and performance?

=> The frequency is the same for both ondemand and performance mode.

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
1600000

 

0 项奖励
回复

465 次查看
joanxie
NXP TechSupport
NXP TechSupport

For optimized performance on the i.MX, you can try to set the stpes as below: CONFIG_ARM_PSCI_CPUIDLE=n in the defconfig files, to avoid cpu enter idle state,  and CONFIG_ARM_PSCI_CPUIDLE is depends on the CONFIG_CPU_FREQ, if couldn't set CONFIG_CPU_FREQ=n, also suggest set CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y to the CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y, I think this should be the same as you use "echo performance" command in the user space

0 项奖励
回复

449 次查看
adv-johnchang
Contributor I

Hi joanxie,

Thanks for your information.

But we don't plan to change the kernel configuration just for one faulty device. We're going to have the distributor help us analyze the problem.

0 项奖励
回复

389 次查看
joanxie
NXP TechSupport
NXP TechSupport

if just one board has this issue, as I mentioned before, you can do the AB test to confirm if this issue is related to the chip, since other boards are fine

0 项奖励
回复