hi
imx6q use L3.0.35_4.1.0_130816 to build u-boot,kernel and rootfs. when we run our app(qt app), the system hang-up when run 3 days. it happen serval times.
in the kernel log you may find notifications like:
[ 73.247613] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=6002 jiffies)
[ 77.257610] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=6002 jiffies)
[ 253.567611] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=24034 jiffies)
[ 257.577611] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=24034 jiffies)
[ 433.887610] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=42066 jiffies)
[ 437.897611] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=42066 jiffies)
[ 614.207612] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=60098 jiffies)
[ 618.217610] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 0} (detected by 3, t=60098 jiffies)
in community, i find someone encounter the same problem,according Re: Re: freescale android kernel have scheduling problem !!!,i follow steps below to patch my kernel.
1.modify time.c
rcu_preempt_state detected stalls" could be fixed with the followng patch, please try
--- linux-3.0.35/arch/arm/plat-mxc/time.c.orig | 2013-12-11 09:25:29.518910709 +1300 |
+++ linux-3.0.35/arch/arm/plat-mxc/time.c | 2013-12-11 09:26:12.958912361 +1300 |
@@ -165,7 +165,8 @@
__raw_writel(tcmp, timer_base + V2_TCMP); |
- | return 0; |
+ return evt < 0x7fffffff &&
+ (int)(tcmp - __raw_readl(timer_base + V2_TCN)) < 0 ? ETIME : 0;
}
#ifdef DEBUG
2.patch 0027-ENGR00306276-iMX6-Add-workaround-for-ARM-errata-7613.patch, 0026-ENGR00306257-1027-fix-system-hang-up-issue-caused-by.patch and 0025-ENGR00295714-GPT-Status-register-bits-are-cleared-in.patch
but when i finish modify,the kernel still hung up like before。 i also try to update gpu lib to p13,but i found lib(libEGL,libGAL) in gpu-viv-bin-mx6q-3.0.35-4.1.0 can not work in p13. is it p13 cannot used in linux?
is there anyting i can try to fix this problem in linux?
Hi we are using Linux3.0.35-4.1.0 + xenomai 2.6.4 on ARM Imx6 and we are facing a very similar problem:
after a couple of hour or some days the system freeze.
We check cat /proc/timer_list and we found the timer seems to miss an event
Timer List Version: v0.6
HRTIMER_MAX_CLOCK_BASES: 3
now at 105225868014163 nsecs
cpu: 0
clock 0:
.base: 80d05940
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <80d05e60>, tick_sched_timer, S:01
# expires at 104075730000000-104075730000000 nsecs [in -1150138014163 to -1150138014163 nsecs]
#1: <8e771a80>, hrtimer_wakeup, S:01
# expires at 104075736565408-104075736615408 nsecs [in -1150131448755 to -1150131398755 nsecs]
#2: <8f001a80>, hrtimer_wakeup, S:01
# expires at 104075764916408-104075764984514 nsecs [in -1150103097755 to -1150103029649 nsecs]
#3: <8fae9a80>, hrtimer_wakeup, S:01
# expires at 104077575167741-104077580167738 nsecs [in -1148292846422 to -1148287846425 nsecs]
......
Tick Device: mode: 1
Per CPU device: 0
Clock Event Device: mxc_timer1
max_delta_ns: 4294967295
min_delta_ns: 85000
mult: 1
shift: 0
mode: 3
next_event: 104075730000000 nsecs
set_next_event: xnarch_next_htick_shot
set_mode: xnarch_switch_htick_mode
event_handler: hrtimer_interrupt
retries: 0
The problem we suppose was the next event value is over the counter:
now at 105225868014163 nsecs
next_event: 104075730000000 nsecs
We notice also all periodic task freezes
We check also and all patches about timer.c problem was inside this kernel release
Can this problem be related to this thread ? anyone have a solution ?
Hi rao and igorpadykov
Out Linux Kernel Version:
Timesys LinuxLink 3.0.35-ts-armv71
* base on [ L3.0.35_4.0.0 ]
* we change the kernel setting about CONFIG_RCU_CPU_STALL_TIMEOUT from 30sec to 10sec .
CPU:i.MX6Q
I develop the apparatus image-processing using Linux and i.mx6q .
my system sometimes hang-up , the phenomenon that is written here .
The frequency operates 15 boards consecutively for two days and occurs in 1-2 boards.
I confirmed a few thing for the individual difference of the hardware .
when my system hang-up , my system may or may not output the log about rcu detected stall.
Follows are the log .
a)
[Thu Mar 05 08:14:11.496 2015] INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 2 3} (detected by 0, t=1002 jiffies)
b)
Mar 5 06:16:09 (none) user.err kernel: INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 1 3} (detected by 0, t=1002 jiffies)
c)
Mar 5 10:32:47 (none) user.err kernel: INFO: rcu_preempt_state detected stall on CPU 3 (t=1001 jiffies)
Mar 5 10:32:47 (none) user.err kernel: INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 3} (detected by 2, t=1002 jiffies)
By the application that we develop, I produce nine child threads from main thread ,
and i watch the movement of the child thread in main thread.
when my system hang-up , either main thread and child thread may be hang-up .
if main thread hang-up, the watchdog timer outputs Power-on Reset .
So i cannot acquire useful log .
I traced child thread when hang-up , the thread did not wakeup after the sleep function call .
I set 1sec or 100ms for the sleep function , but the thread wakeup 10 sec later .
I implement a program to put up a stop flag of the global variable
when there is not a reply, it more than 10 seconds .
I felt that I linked about the wakeup and change of the global variable .
Question:
1) Has your system been hang-up without outputting log about rcu detected stall?
2) Could you tell me a cause that you think about this problem ?
Hi Rao xd,
Do you have any simple way to reproduce the "rcu_preempt_state detected stalls on CPUs/tasks" error on L3.0.35_4.1.0?
Hi Bateman Cai,
Unfortunately, Android 4.2.2 is based on L3.0.35_4.0.0.
hi PeterChan:
we can reproduce the problem when we run usb stress test . attach is our test tool which can run on imx6.
we can see "rcu_preempt_state detected stalls on CPUs/tasks" error when we run these test one night.
out test step is:
1.attach u disk to imx6 usb interface,example mount /dev/sdb1 /mnt
2.run usb test.
./FileSysTest 2 /mnt 100000 3 &
./FileSysTest 2 /root 100000 3 500M &
we can found that if cpu waitio Achieve 90% or high, the error can be easy reproduce.(waittio can be see by run "top").
Hi rao xd,
I see that your tool is using the i.MX6 usb host to read / write usb removable drive. In L3.0.35_4.1.0, there is a software bug that can make the usb host stop working. In order to ensure the hang-up is not caused by the usb host driver, would you please apply this usb memory patch for L3.0.35_4.1.0 and test this problem again?
Thanks,
Peter
hi PeterChan:
i update system to L3.0.101_4.1.1_141016.and run test again. The problem still persists.
i force mx6q to run single core,the problem disappear.
Hi rao xd,
Unfortunately, I am not able to reproduce this issue on i.MX6Q in L3.0.101_4.1.1_141016 by running the test "./FileSysTest 2 /mnt/ 100000 3 500M &". Actually, the fix for the "rcu_preempt_state detected stall on CPU" are
1.
--- linux-3.0.35/arch/arm/plat-mxc/time.c.orig | 2013-12-11 09:25:29.518910709 +1300 |
+++ linux-3.0.35/arch/arm/plat-mxc/time.c | 2013-12-11 09:26:12.958912361 +1300 |
@@ -165,7 +165,8 @@
__raw_writel(tcmp, timer_base + V2_TCMP); |
- | return 0; |
+ return evt < 0x7fffffff &&
+ (int)(tcmp - __raw_readl(timer_base + V2_TCN)) < 0 ? ETIME : 0;
}
#ifdef DEBUG
2.
patch 0027-ENGR00306276-iMX6-Add-workaround-for-ARM-errata-7613.patch, 0026-ENGR00306257-1027-fix-system-hang-up-issue-caused-by.patch and 0025-ENGR00295714-GPT-Status-register-bits-are-cleared-in.patch
and they are all merged into L3.0.101_4.1.1_141016.
Regards,
Peter
i has update to 3.0.101 and run on freescale SDB board.but The problem still persists.
the test include(test run at the same time)
1.emmc stress test
2.network stress
3.memtest
we found that the problem can be easy appear when cpu watiio is high(in linux,run top,can see x.x%wa is high).
This is my problem, and it comes up when traffic pressure increases.
Hi
Can you kindly provide more details on the steps to reproduce the issue. We are not observing this issue in our testing with 3.0.101.
Can you also try enabling the Performance governor
Thanks
Asim
attach is test tool.
1、memtest
can run memTest --?for help
we run ./memTest 100M 1000000
2、FileSysTest
attach u disk to imx6 usb interface,example mount /dev/sdb1 /mnt
run usb test.
./FileSysTest 2 /mnt 100000 3 &
./FileSysTest 2 /root 100000 3 500M &
3、NetStress Test
use two device for test. one for client ,the other for service.( NetCntTest ,test_NetCnt_stress.sh for linux;netCntTest.exe,test_netCnt.bat for windows)
unzip NetCnt.rar .
client:
./NetCntTest deviceType ScrIp DstIp DstPort packetlen
devictType: 0 for client, 1 for service
for example ./NetCntTest 0 172.17.179.4 172.17.0.61 20001 10000.
(20001 is dst port)
service:
./NetCntTest deviceType ScrIp DstIp ScrPort packetlen
for example
./NetCntTest 1 172.17.0.61 20001 10000
(20001 is local port)
step1,2,3 run at the same time。
PS:i still enable PCIE in SDB which connect to network card。
Dear raoxd
Please let me confirm your progress.
My customer has same issue and error messages.
The message that my customer received is below.
INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 2 3} (detected by 0, t=1002 jiffies)
Finally, could you resolve the issue ?
If yes, please share what did you do to resolve it.
Best Regards,
Ko-hey
no,the problem still persists.
i have provide test tools,and wating for freescale for a reply.
Hi rao xd
I was expecting that it has already resolved…
Did you contact to freescale's FAE in other way ?
Ko-hey
Hi Ko-hey
How you solve it? We have met the similar problem. But we do not have log output on serial port. The system hangup. We can not login with serial port or network. Our kernel version is 3.0.35_4.1.0 (Yocto 1.5 Dora).
BR
Jerry
Hi,PeterChan :
1. I also meet the same problem which patch 0027-ENGR00306276-iMX6-Add-workaround-for-ARM-errata-7613.patch, 0026-ENGR00306257-1027-fix-system-hang-up-issue-caused-by.patch and 0025-ENGR00295714-GPT-Status-register-bits-are-cleared-in.patch and 0009-ARM-imx-return-zero-in-case-next-event-gets-a-large-.patch.
2. and if I config the configure with "# LOCAL_TIMERS is not set", the problem seems solved.. but if i config the configure with "LOCAL_TIMERS=y" the system hung up after few days again.
3. if "LOCAL_TIMERS is not set", it can influence the data of our app comunicate to imx6q by USB delay or loss.
by the way, my kernel is 3.0.35_4.0.0,android4.2.2,imx6.
is there anyting i can try to fix this problem in linux? thanks!
Hi rao
please try attached patches.
There are gpu fixes in latest upgrade 3.10.17-1.0.1 and
is planned L3.10.17 1.0.2 by the end of Oct.
Freescale OpenEmbedded/Yocto Layers discussion list ()
Best regards
igor
Hi igorpadykov
the patch you provided can not suitable for Linux3.0.35_4.1.0 when i patch the file, it prompt patch fail.the below is unsucceed patch.
0002-ENGR00316180-iMX6x-Support-IRAM-page-table-when-DDR-.patch
0035-ENGR00334447-imx6qdl-Fix-random-failures-caused-by-d.patch
can you provide patch that suitable for Linux3.0.35_4.1.0? or can you tell me what linux version you used? thanks.
patch will be released on next week on official i.MX6 site.