imx8mp: reboot fails, reset/watchdog

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

imx8mp: reboot fails, reset/watchdog

328 次查看
s_arendt
Contributor III

Running into an issue when a "reboot" in Linux is happened. From time to time (1 out of 20) this ends up in a strange situation. Linux shut down properly but Kernel doesn't boot up completly.

I am a bit clueless how to find the reason. My findings so far:

* a powercycle or an reset to PMIC (via I2C): and all is fine again.

* Depending on kernel config and kernel version the boot stops at different stages: from only "Starting Kernel" til full boot anything can happen (but with same version it is reproducable, once we are in this situation).

* Depending on kernel and config it can happen that watchdog is triggered - than we end up in a bootloop. Or it keeps into kernel printing:

[ 22.283106] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 22.289202] rcu: 1-...0: (1 ticks this GP) idle=645/1/0x4000000000000000 softirq=45/45 fqs=2626
[ 22.298082] (detected by 2, t=5254 jiffies, g=-1127, q=13)
[ 22.303657] Task dump for CPU 1:
[ 22.306885] task:swapper/0 state:R running task stack: 0 pid: 1 ppid: 0 flags:0x0000000a
[ 22.316808] Call trace:
[ 22.319251] __switch_to+0x108/0x160
[ 22.322838] ___slab_alloc+0x654/0x7b4
[ 22.326595] __slab_alloc.constprop.0+0x38/0x7c
[ 22.331130] 0xffff0000034a0000
[ 31.149558] imx2_wdt_ping

(watchdog is still serviced, so it doesn't stop)

 

* When I use a kernel that boots up completly - this doesn't "heal" the situation. Booting after that with our standard kernel it hangs again.

* clk dump in uboot is same (whether SoC is in this situation or normal)

* Looking at NXP-EVK board any kind of reboot (linux) or restart (uboot) is stated as POR in uboot? That sounds strange to me. My board is always show WDOG, except on a power cycle (POR) - which seems correct to me.

 

I am wondering what can be the reason? Clocks seems not to be root cause.. It could be some misconfigured registers, memory or power domains. Is the reset of SoC is done properly?
Is in Kernel anything board specific what could go wrong or is missed? Where is the final reset command in Linux kernel or does reboot triggers the WDOG to finally make the reset?

I am thinking of a workaround to include in Uboot this restart of PMIC via I2C, but it has to run only if no POR is recognised. Otherwise it board will never come up. Another way would be to included that PMIC reset in shutdown of kernel. But these are all dirty workarounds....

Any ideas? (Our board uses different PMIC than NXP, tried different uboot up to 2024.07 and kernels).

0 项奖励
回复
3 回复数

305 次查看
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello,

If it happens 1 or 20 I guess this is not software and You may check the DDR, please try the DDR tools

https://community.nxp.com/t5/i-MX-Processors-Knowledge-Base/i-MX-8-8X-Family-DDR-Tools-Release/ta-p/...

 

Regards

0 项奖励
回复

287 次查看
s_arendt
Contributor III

Think that can be excluded. That would mean the issue would happen each time at the exact same address (and with different boards!), so that this issue is exactly reproducible.  True,  the issue is not happen every time, but it happens at the same place? 

I also think about hardware - power supply. But I can't see special stress in that phase.

Is there a problem in warm boot/reset? I mean: why uboot in NXP board reports POR in all cases?

0 项奖励
回复

276 次查看
s_arendt
Contributor III

I have seen that in uboot in bords/freescale/imx8mp_evk/spl.c there is is this function (pay attention to the comment - what will it tell to me?):

/* Do not use BSS area in this phase */
void board_init_f(ulong dummy)
{

If I compare this function  with that from other freescale boards (like im8mn) I miss in imx8mp the following lines:

/* Clear the BSS. */
memset(__bss_start, 0, __bss_end - __bss_start);

board_init_r(NULL, 0);

 

I am talking about the mainline uboot 2023.04 and 2024.04.
Well - with much hope I added these lines in my spl.c. My impression - the number of failing reboots gets less, but I could still observe that problem. Anyone has more experience why this lines are missed? Is this intended?
I mean there is a problem with warm starts, so what are the differences? RAM might not be cleared, registers might have some values...

 

 

 

 

0 项奖励
回复