AnsweredAssumed Answered

iMX6 / v3.0.35 / system hang-up problem

Question asked by Vladimir Zapolskiy on Nov 9, 2013
Latest reply on Mar 28, 2018 by Fan Al
Branched to a new discussion

This information is mainly intended for users of iMX6 Freescale BSP with the kernel version v3.0.X, who uses high resolution timer and local timers, both of them are set by default.

 

At very rare cases (weeks or months of continuous running) a user may encounter a system hang-up, and in the kernel log you may find notifications like:

 

    INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 1 2 3} (detected by 0, t=8476 jiffies)

    INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 1 2} (detected by 3, t=6703 jiffies)

 

If this situation happens, the system becomes irresponsible, however after some passed time (tens of minutes or hours) the system may thaw, also sometimes system time may be reset or have not valid ticks.

 

The main indication of the problem is that GPT timer interrupts are not received anymore, from GPT register values it is possible to figure out that the next planned interrupt was set in the past, and occasionally the system may restore itself after a long time interval (several minutes or hours), if GPT timers are newly rearmed from a local timer.

 

Hopefully I managed to write a test, which allows to reproduce the issue at faster rate, usually in 4-8 hours of run, the test is attached.

 

The test allowed me to approach to the root cause. A core, which manages tick broadcast device abstraction, doesn't do an in time rearmament for GPT. My conclusion is that imx6 specific arch_idle() realization from Freescale BSP relies on missing stable enough tick broadcast mechanisms/features in v3.0.35, and it makes impossible to consistently save some interrupt calls from ARM core timers by relying on i.MX6 GPT plus present tick broadcast mechanism. I analyzed tick broadcast kernel's subsystem, but I didn't find any bugs on surface, also quite excessive backporting of tick broadcast patches from mainline didn't solve the problem also. However on newer kernel version v3.8.13 for iMX6 with present clockevents_notify() in arch_idle() the problem is not reproduced anymore, at least with this test.

Original Attachment has been moved to: stall.c.zip

Outcomes