IMX6Q system hang-up problem / linux kernel(3.0.35)

raoxd · ‎10-21-2014

hi

imx6q use L3.0.35_4.1.0_130816 to build u-boot,kernel and rootfs. when we run our app(qt app), the system hang-up when run 3 days. it happen serval times.

in the kernel log you may find notifications like:

     [ 73.247613] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=6002 jiffies)
     [   77.257610] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=6002 jiffies)
     [ 253.567611] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=24034 jiffies)
     [ 257.577611] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=24034 jiffies)
     [ 433.887610] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=42066 jiffies)
     [ 437.897611] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=42066 jiffies)
     [ 614.207612] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=60098 jiffies)
     [ 618.217610] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=60098 jiffies)

in community, i find someone encounter the same problem,according Re: Re: freescale android kernel have scheduling problem !!!,i follow steps below to patch my kernel.

1.modify time.c

rcu_preempt_state detected stalls" could be fixed with the followng patch, please try

--- linux-3.0.35/arch/arm/plat-mxc/time.c.orig	2013-12-11 09:25:29.518910709 +1300
+++ linux-3.0.35/arch/arm/plat-mxc/time.c	2013-12-11 09:26:12.958912361 +1300

@@ -165,7 +165,8 @@

__raw_writel(tcmp, timer_base + V2_TCMP);

-

return 0;

+ return evt < 0x7fffffff &&

+ (int)(tcmp - __raw_readl(timer_base + V2_TCN)) < 0 ? ETIME : 0;

}

#ifdef DEBUG

2.patch 0027-ENGR00306276-iMX6-Add-workaround-for-ARM-errata-7613.patch, 0026-ENGR00306257-1027-fix-system-hang-up-issue-caused-by.patch and 0025-ENGR00295714-GPT-Status-register-bits-are-cleared-in.patch

but when i finish modify，the kernel still hung up like before。 i also try to update gpu lib to p13，but i found lib(libEGL,libGAL) in gpu-viv-bin-mx6q-3.0.35-4.1.0 can not work in p13. is it p13 cannot used in linux?

is there anyting i can try to fix this problem in linux?

federicoloda · ‎12-10-2016

Hi we are using Linux3.0.35-4.1.0 + xenomai 2.6.4 on ARM Imx6 and we are facing a very similar problem:
after a couple of hour or some days the system freeze.

We check cat /proc/timer_list and we found the timer seems to miss an event

Timer List Version: v0.6
HRTIMER_MAX_CLOCK_BASES: 3
now at 105225868014163 nsecs

cpu: 0
clock 0:
.base:       80d05940
.index:      0
.resolution: 1 nsecs
.get_time:   ktime_get
.offset:     0 nsecs
active timers:
#0: <80d05e60>, tick_sched_timer, S:01
# expires at 104075730000000-104075730000000 nsecs [in -1150138014163 to -1150138014163 nsecs]
#1: <8e771a80>, hrtimer_wakeup, S:01
# expires at 104075736565408-104075736615408 nsecs [in -1150131448755 to -1150131398755 nsecs]
#2: <8f001a80>, hrtimer_wakeup, S:01
# expires at 104075764916408-104075764984514 nsecs [in -1150103097755 to -1150103029649 nsecs]
#3: <8fae9a80>, hrtimer_wakeup, S:01
# expires at 104077575167741-104077580167738 nsecs [in -1148292846422 to -1148287846425 nsecs]

......

Tick Device: mode:     1
Per CPU device: 0
Clock Event Device: mxc_timer1
max_delta_ns:   4294967295
min_delta_ns:   85000
mult:           1
shift:          0
mode:           3
next_event:     104075730000000 nsecs
set_next_event: xnarch_next_htick_shot
set_mode:       xnarch_switch_htick_mode
event_handler: hrtimer_interrupt
retries:        0

The problem we suppose was the next event value is over the counter:

now at 105225868014163 nsecs

next_event: 104075730000000 nsecs
We notice also all periodic task freezes

We check also and all patches about timer.c problem was inside this kernel release

Can this problem be related to this thread ? anyone have a solution ?

sei-umemura · ‎03-15-2015

Hi rao and igorpadykov

Out Linux Kernel Version:

Timesys LinuxLink 3.0.35-ts-armv71

* base on [ L3.0.35_4.0.0 ]

* we change the kernel setting about CONFIG_RCU_CPU_STALL_TIMEOUT from 30sec to 10sec .

CPU:i.MX6Q

I develop the apparatus image-processing using Linux and i.mx6q .

my system sometimes hang-up , the phenomenon that is written here .

The frequency operates 15 boards consecutively for two days and occurs in 1-2 boards.

I confirmed a few thing for the individual difference of the hardware .

when my system hang-up , my system may or may not output the log about rcu detected stall.

Follows are the log .

a)

[Thu Mar 05 08:14:11.496 2015] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 2 3} (detected by 0, t=1002 jiffies)

b)

Mar 5 06:16:09 (none) user.err kernel: INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 1 3} (detected by 0, t=1002 jiffies)

c)

Mar 5 10:32:47 (none) user.err kernel: INFO: rcu_preempt_state detected stall on CPU 3 (t=1001 jiffies)

Mar 5 10:32:47 (none) user.err kernel: INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 3} (detected by 2, t=1002 jiffies)

By the application that we develop, I produce nine child threads from main thread ,

and i watch the movement of the child thread in main thread.

when my system hang-up , either main thread and child thread may be hang-up .

if main thread hang-up, the watchdog timer outputs Power-on Reset .

So i cannot acquire useful log .

I traced child thread when hang-up , the thread did not wakeup after the sleep function call .

I set 1sec or 100ms for the sleep function , but the thread wakeup 10 sec later .

I implement a program to put up a stop flag of the global variable

when there is not a reply, it more than 10 seconds .

I felt that I linked about the wakeup and change of the global variable .

Question:

1) Has your system been hang-up without outputting log about rcu detected stall?

2) Could you tell me a cause that you think about this problem ?

PeterChan · ‎11-13-2014

Hi Rao xd,

Do you have any simple way to reproduce the "rcu_preempt_state detected stalls on CPUs/tasks" error on L3.0.35_4.1.0?

Hi Bateman Cai,

Unfortunately, Android 4.2.2 is based on L3.0.35_4.0.0.

raoxd · ‎11-14-2014

hi PeterChan:

we can reproduce the problem when we run usb stress test . attach is our test tool which can run on imx6.

we can see "rcu_preempt_state detected stalls on CPUs/tasks" error when we run these test one night.

out test step is:

1.attach u disk to imx6 usb interface,example mount /dev/sdb1 /mnt

2.run usb test.

./FileSysTest 2 /mnt 100000 3 &

./FileSysTest 2 /root 100000 3 500M &

we can found that if cpu waitio Achieve 90% or high, the error can be easy reproduce.(waittio can be see by run "top").

PeterChan · ‎11-21-2014

Hi rao xd,

I see that your tool is using the i.MX6 usb host to read / write usb removable drive. In L3.0.35_4.1.0, there is a software bug that can make the usb host stop working. In order to ensure the hang-up is not caused by the usb host driver, would you please apply this usb memory patch for L3.0.35_4.1.0 and test this problem again?

Thanks,

Peter

raoxd · ‎01-07-2015

hi PeterChan:

i update system to L3.0.101_4.1.1_141016.and run test again. The problem still persists.

i force mx6q to run single core,the problem disappear.

PeterChan · ‎01-14-2015

Hi rao xd,

Unfortunately, I am not able to reproduce this issue on i.MX6Q in L3.0.101_4.1.1_141016 by running the test "./FileSysTest 2 /mnt/ 100000 3 500M &". Actually, the fix for the "rcu_preempt_state detected stall on CPU" are

1.

--- linux-3.0.35/arch/arm/plat-mxc/time.c.orig	2013-12-11 09:25:29.518910709 +1300
+++ linux-3.0.35/arch/arm/plat-mxc/time.c	2013-12-11 09:26:12.958912361 +1300

@@ -165,7 +165,8 @@

__raw_writel(tcmp, timer_base + V2_TCMP);

-

return 0;

+ return evt < 0x7fffffff &&

+ (int)(tcmp - __raw_readl(timer_base + V2_TCN)) < 0 ? ETIME : 0;

}

#ifdef DEBUG

2.

patch 0027-ENGR00306276-iMX6-Add-workaround-for-ARM-errata-7613.patch, 0026-ENGR00306257-1027-fix-system-hang-up-issue-caused-by.patch and 0025-ENGR00295714-GPT-Status-register-bits-are-cleared-in.patch

and they are all merged into L3.0.101_4.1.1_141016.

Regards,

Peter

asim_zaidi · ‎02-09-2015

Hi raoxd

Can you kindly provide an update on your testing. Do you still observe the issue with the patches provided by PeterChan

Thanks

Asim

raoxd · ‎02-09-2015

i has update to 3.0.101 and run on freescale SDB board.but The problem still persists.

the test include(test run at the same time)

1.emmc stress test

2.network stress

3.memtest

we found that the problem can be easy appear when cpu watiio is high(in linux,run top,can see x.x%wa is high).

jobs · ‎02-18-2022

This is my problem, and it comes up when traffic pressure increases.

MPC85XX T2080 E6500 RCU检测CPU X卡顿 - 恩智浦社区（nxp.com）

asim_zaidi · ‎02-12-2015

Hi

Can you kindly provide more details on the steps to reproduce the issue. We are not observing this issue in our testing with 3.0.101.

Can you also try enabling the Performance governor

Thanks

Asim

raoxd · ‎02-12-2015

attach is test tool.

1、memtest

can run memTest --?for help

we run ./memTest 100M 1000000

2、FileSysTest

attach u disk to imx6 usb interface,example mount /dev/sdb1 /mnt

run usb test.

./FileSysTest 2 /mnt 100000 3 &

./FileSysTest 2 /root 100000 3 500M &

3、NetStress Test

use two device for test. one for client ,the other for service.( NetCntTest ,test_NetCnt_stress.sh for linux；netCntTest.exe，test_netCnt.bat for windows)

unzip NetCnt.rar .

client:

./NetCntTest deviceType ScrIp DstIp DstPort packetlen

devictType: 0 for client, 1 for service

for example ./NetCntTest 0 172.17.179.4 172.17.0.61 20001 10000.

(20001 is dst port)

service:

./NetCntTest deviceType ScrIp DstIp ScrPort packetlen

for example

./NetCntTest 1 172.17.0.61 20001 10000

(20001 is local port)

step1,2,3 run at the same time。

PS：i still enable PCIE in SDB which connect to network card。

ko-hey · ‎03-16-2015

Dear raoxd

Please let me confirm your progress.

My customer has same issue and error messages.

The message that my customer received is below.

INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 2 3} (detected by 0, t=1002 jiffies)

Finally, could you resolve the issue ?

If yes, please share what did you do to resolve it.

Best Regards,

Ko-hey

raoxd · ‎03-16-2015

no,the problem still persists.

i have provide test tools,and wating for freescale for a reply.

ko-hey · ‎03-16-2015

Hi rao xd

I was expecting that it has already resolved…

Did you contact to freescale's FAE in other way ?

Ko-hey

jianzhu · ‎01-09-2017

Hi Ko-hey

How you solve it? We have met the similar problem. But we do not have log output on serial port. The system hangup. We can not login with serial port or network. Our kernel version is 3.0.35_4.1.0 (Yocto 1.5 Dora).

BR

Jerry

jay0725jay · ‎11-03-2014

Hi,PeterChan :

1. I also meet the same problem which patch 0027-ENGR00306276-iMX6-Add-workaround-for-ARM-errata-7613.patch, 0026-ENGR00306257-1027-fix-system-hang-up-issue-caused-by.patch and 0025-ENGR00295714-GPT-Status-register-bits-are-cleared-in.patch and 0009-ARM-imx-return-zero-in-case-next-event-gets-a-large-.patch.

2. and if I config the configure with "# LOCAL_TIMERS is not set", the problem seems solved.. but if i config the configure with "LOCAL_TIMERS=y" the system hung up after few days again.

3. if "LOCAL_TIMERS is not set", it can influence the data of our app comunicate to imx6q by USB delay or loss.

by the way, my kernel is 3.0.35_4.0.0,android4.2.2,imx6.

is there anyting i can try to fix this problem in linux? thanks!

igorpadykov · ‎10-21-2014

Hi rao

please try attached patches.

There are gpu fixes in latest upgrade 3.10.17-1.0.1 and

is planned L3.10.17 1.0.2 by the end of Oct.

Freescale OpenEmbedded/Yocto Layers discussion list ()

Best regards

igor

raoxd · ‎10-23-2014

Hi igorpadykov

the patch you provided can not suitable for Linux3.0.35_4.1.0 when i patch the file, it prompt patch fail.the below is unsucceed patch.

0002-ENGR00316180-iMX6x-Support-IRAM-page-table-when-DDR-.patch

0035-ENGR00334447-imx6qdl-Fix-random-failures-caused-by-d.patch

can you provide patch that suitable for Linux3.0.35_4.1.0？ or can you tell me what linux version you used? thanks.

igorpadykov · ‎10-23-2014

patch will be released on next week on official i.MX6 site.

IMX6Q system hang-up problem / linux kernel(3.0.35)

IMX6Q system hang-up problem / linux kernel(3.0.35)

i.MX6Quad

Linux