iMX6 / v3.0.35 / system hang-up problem

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

iMX6 / v3.0.35 / system hang-up problem

13,003 Views
vladimirzapolsk
Contributor II

This information is mainly intended for users of iMX6 Freescale BSP with the kernel version v3.0.X, who uses high resolution timer and local timers, both of them are set by default.

 

At very rare cases (weeks or months of continuous running) a user may encounter a system hang-up, and in the kernel log you may find notifications like:

 

    INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 1 2 3} (detected by 0, t=8476 jiffies)

    INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 1 2} (detected by 3, t=6703 jiffies)

 

If this situation happens, the system becomes irresponsible, however after some passed time (tens of minutes or hours) the system may thaw, also sometimes system time may be reset or have not valid ticks.

 

The main indication of the problem is that GPT timer interrupts are not received anymore, from GPT register values it is possible to figure out that the next planned interrupt was set in the past, and occasionally the system may restore itself after a long time interval (several minutes or hours), if GPT timers are newly rearmed from a local timer.

 

Hopefully I managed to write a test, which allows to reproduce the issue at faster rate, usually in 4-8 hours of run, the test is attached.

 

The test allowed me to approach to the root cause. A core, which manages tick broadcast device abstraction, doesn't do an in time rearmament for GPT. My conclusion is that imx6 specific arch_idle() realization from Freescale BSP relies on missing stable enough tick broadcast mechanisms/features in v3.0.35, and it makes impossible to consistently save some interrupt calls from ARM core timers by relying on i.MX6 GPT plus present tick broadcast mechanism. I analyzed tick broadcast kernel's subsystem, but I didn't find any bugs on surface, also quite excessive backporting of tick broadcast patches from mainline didn't solve the problem also. However on newer kernel version v3.8.13 for iMX6 with present clockevents_notify() in arch_idle() the problem is not reproduced anymore, at least with this test.

Original Attachment has been moved to: stall.c.zip

Labels (2)
16 Replies

4,403 Views
lixiaohui
Contributor II

in imx6 linux kernel 3.0.35 , the problem about system hung for while , i guess the root cause is in :

arch/arm/mach-mx6/clock.c  the body of WAIT macro

reg |= V2_TSTAT_ROV;     

this sentence is wrong, it should be changed into

 reg &= V2_TSTAT_ROV;

imx6_gpt.jpg

so when a gpt timer interrupt arrives, the number of bit "1" in reg is > 1, so using these sentence
reg(2) = reg(1) | V2_TSTAT_ROV; , at end , the number of bit "1"  reg(2) may become 0 , also the number of bit "1"  reg(2) > 1.
so we cann't accept a gpt interrupt with the wrong sentence reg |= V2_TSTAT_ROV;
what about my explaination about root cause for this system hung up problem ?
is my idea right ? everybody can sugget me if my explain is wrong. i welcom it.

4,403 Views
kurkinalexandr
Contributor III

hi . Have you tried this fix for the issue? I am going to do this during couple of days.

0 Kudos
Reply

4,403 Views
chrisrutherford
Contributor II

Hi.  I am also seeing this problem on 3.10.53-1.1.0_ga.  Is there a patch for this kernel version?

[105604.309961] INFO: rcu_preempt self-detected stall on CPU { 0}  (t=2961 jiffies g=3166121 c=3166120 q=2)
[117725.769902] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=4320 jiffies, g=3528425, c=3528424, q=3)
[117725.781144] INFO: Stall ended before state dump start

Best regards,.

0 Kudos
Reply

4,403 Views
franciscoperez
Contributor I

Hi, I found valuable information on this thread !!  I just want to add my 2 cents.

After trying to apply the suggested patch to 3.0.35 kernel we got some issues booting (We are running Android). Looking at the patch on itself it seems that we could get the same effect sending the "enable_wait_mode=off" parameter (sent from u-boot to kernel on the command line).   We went that route and all issues of CPU Stall/Hangs, Clock stopped by minutes etc seems have gone away... 

Look for other threads about the effects on disable that parameter to see if it could work for you.  In our case power consumption was not an issue as is not a mobile solution.

0 Kudos
Reply

4,403 Views
vladimirzapolsk
Contributor II

The fast fix/workaround is quite straightforward, namely don't use i.MX6 GPT (or other way round use GPT, but don't use local timers), and experimentally it doesn't increase board's power consumption visibly (but local timer interrupt rate is obviously quite high). Also it seems that disabled high resolution timer allows to avoid the problem or reduce its frequency, most probably due to a changed tick/schedule state machine in the kernel.

In details the commit b3743135efb in Freescale 3.0.X iMX6 BSP seems to be error prone under system heavy load, if local timers are enabled. I've attached a patch, which resolves the problem without any additional needed changes to the kernel configuration, the change implies local timers are running instead of GPT. The change is compulsory to have to guarantee system stability, if local timers and high resolution timer configuration is not touched.

Any comments are appreciated.

0 Kudos
Reply

4,403 Views
paulgeurts
Contributor I

Hello Vladimir,

I am experiencing this problem as well on 3.10.17_1.0.0_ga. Do you know of any fix for this kernel?

0 Kudos
Reply

4,403 Views
Chris1z
Contributor III

Hi Vladimir,

Which kernel branch / BSP release was this patch made for exactly?  I am running 4.2.2-1.1.0-ga Android release and it already seems to have this code removed. 

Thanks,

Chris

0 Kudos
Reply

4,403 Views
vladimirzapolsk
Contributor II

Hi Chris,

as I mentioned the problem is known to be reproduced on Linux v3.0.35 (L3.0.35_4.0.0_130424 in particular) / iMX6Q/D, if you have this version of the kernel, then it's good to know that this code was removed, otherwise I'd recommend to run the attached test for several hours and see, if the problem exists on your setup. It seems that more recent Linux kernel versions doesn't have such a problem due to changes in timer broadcast state machine, and calls to clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_*, &cpu) from arch_idle() on iMX6Q/D are safe enough.

With best wishes,

Vladimir

0 Kudos
Reply

4,403 Views
jasonjiang
Contributor II

I was able to reproduce it very easily (within one minute) using the attached sample code. It does nothing but a simple flag animation in Qt/QML. While the flag was waving, I kept clicking mouse for several seconds, then the whole screen froze for several seconds, then resumed.

I used dmesg to check the kernel log, but didn't see any new messages after I ran the example.

My question is where I can get the latest Linux kernel v3.8.13 for iMX6?

Thanks a lot.

Jason

0 Kudos
Reply

4,403 Views
fabio_estevam
NXP Employee
NXP Employee

Hi Jason,

If you plan to use a more recent kernel you can try 3.10.17 from FSL available at git.freescale,com or you can use 3.13.3 from kernel.org.

Regards,

Fabio Estevam

0 Kudos
Reply

4,403 Views
simonyeh
Contributor I

Hi, Fabio Estevam

I am use FSL Community BSP 1.6.2 with kernel 3.10.17-1.0.0 , also have the same issue as below..

INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=2102 jiffies, g=325, c=324, q=59)

INFO: Stall ended before state dump start

How to fix it?

Thanks .

0 Kudos
Reply

4,403 Views
paul_lee
Contributor II

hello vladimirzapolskiy

Is there any fixes for 3.10 kernel?

We are facing same issue.

0 Kudos
Reply

4,403 Views
saurabh206
Senior Contributor III

Hi

Fabio

I am facing similar issue with

BSP is Yocto with 3.10.17-1.0.0_GA.

[  939.391052] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 0, t=2102 jiffies, g=35792, c=35791, q=105)

[  939.402018] INFO: Stall ended before state dump start

How to fix rcu_preempt?

0 Kudos
Reply

4,403 Views
jasonjiang
Contributor II

Fabio,

Thank you for the reply.

I got 3.10.17 from git.freescale.com and built the kernel (uImage). But when I downloaded the uboot source from the same website, I just don't know how to build it.

I need to setup the following:

     export PATH=/opt/freescale/usr/local/gcc-4.6.2-glibc-2.13-linaro-multilib-2011.12/fsl-linaro-toolchain/bin/:/home/linuxdev/bin

     export ARCH=arm

     export CROSS_COMPILE=arm-fsl-linux-gnueabi-

But I could find the config file like mx6qsabresd_config, hence cannot run "make mx6qsabresd_config".


After I build uboot, do I need to build rootfs for version 3.10.17? I couldn't find the source code. I was trying to following the suggestion from Ed Sutter (https://community.freescale.com/thread/315214) which mentioned to build rootfs.


Could you help?

Thanks,

Jason

0 Kudos
Reply

4,403 Views
fabio_estevam
NXP Employee
NXP Employee

Jason,

Please start a new thread about this.

Regards,

Fabio Estevam

0 Kudos
Reply

4,403 Views
jasonjiang
Contributor II

Ok. I started a new thread here: How to build Linux 3.10.17 for iMX6?

Thanks,

Jason

0 Kudos
Reply