T2081 rcu_preempt self-detected stall while flush TLB

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

T2081 rcu_preempt self-detected stall while flush TLB

8,653 Views
panjiading
Contributor I

Hi all, 

   I am using the T2081, runnig Linux 3.12 . Our production is a Baseband Station.

   Some time while the program is running, the linux kernel occur some warnning about RCU like below:

INFO: rcu_preempt self-detected stall on CPU { 2 }

[c0000001f5206f60] [c000000000815090] .dump_stack+0x9c/0xf0
[c0000001f5206fe0] [c0000000000b0de0] .rcu_check_callbacks+0x40c/0x9a4
[c0000001f5207110] [c0000000000442d0] .update_process_times+0x50/0x94
[c0000001f52071a0] [c00000000009410c] .tick_sched_handle.isra.18+0x3c/0x50
[c0000001f5207210] [c00000000009417c] .tick_sched_timer+0x5c/0x98
[c0000001f52072b0] [c00000000005f790] .__run_hrtimer.isra.32+0x188/0x1e4
[c0000001f5207340] [c0000000000601f4] .hrtimer_interrupt+0x178/0x3f0
[c0000001f5207460] [c000000000010d60] .timer_interrupt+0x1f4/0x218
[c0000001f5207510] [c00000000001a054] exc_0x900_common+0x104/0x108
--- Exception: 901 at .smp_call_function_many+0x35c/0x3ec
LR = .smp_call_function_many+0x31c/0x3ec
[c0000001f5207800] [c00000000009ba48] .smp_call_function_many+0x2f8/0x3ec (unreliable)
[c0000001f52078e0] [c000000000025270] .__flush_tlb_page+0x17c/0x214
[c0000001f52079a0] [c000000000022dc4] .ptep_set_access_flags+0x90/0x14c
[c0000001f5207a40] [c0000000000eb1c8] .do_wp_page+0x52c/0xc20
[c0000001f5207b20] [c0000000000ee3c0] .handle_mm_fault+0x73c/0xbac
[c0000001f5207c00] [c000000000021fc4] .do_page_fault+0x340/0x748
[c0000001f5207e30] [c00000000001b1d4] storage_fault_common+0x20/0x44
CPU: 2 PID: 2601 Comm: tmf.exe Tainted: G O 3.12LINUX_V1.02.0 #1

(For detail please see the attachment: "Linux RCU STALL AND HUANG.txt")

When the warnning happen, the Linux is hung, may be that too many cpu is RCU Stall!

This happens at random, The law of reappearance has not been found yet。

How can I solve the Problem, do anyone have seen this problem before, 

I am looking forwoard to hearing from you 

Best Regard 

Thanks!

Labels (1)
0 Kudos
Reply
5 Replies

5,870 Views
jobs
Contributor III

I've been pleading here for months.People here are so cold, they never talk to anyone.So let it go

0 Kudos
Reply

7,416 Views
panjiading
Contributor I

Is there anyone face this problem ? 

Thanks!

0 Kudos
Reply

7,415 Views
yipingwang
NXP TechSupport
NXP TechSupport

Hello pan jiading,

Real Time (RT) feature available in the operating system aims at creating an environment to meet these time critical processing requirements.

Please check whether RT is enabled in Linux Kernel configuration.

Kernel Configure Tree View Options Description
Kernel options --->
         Preemption Model (Fully Preemptible Kernel (RT)) --->
                 (X) Fully Preemptible Kernel (RT)

There is feature RT throttling in RT enabled kernel which is enabled by default. It creates blockout window of 5ms for RT tasks in every 1 second window. In this window only non-rt tasks can execute. Network traffic runs in softirq context which is RT priority thread. AS heavy traffic is being run, throttling get activated. RCU locks are being held by softirq threads and that being blocked for 5ms, rcu might get timeout.
In case of heavy traffic first disable throttling by:
"echo -1 > /proc/sys/kernel/sched_rt_runtime_us" and then proceed with testing.


Have a great day,
TIC

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply

7,416 Views
pedro_valladare
Contributor I

Hi,

I have a very similar backtrace in a T4240 power PC system. I posted my question to that group and no reply yet..I have already echo'd a -1 to /proc/sys/kernel/sched_rt_runtime_us and it did not help...Any other suggestions?

This happens 100% of the time when we connect a USB type C from our box to a server that is running VMWare ESXi 6.7.0  and passthrough has not been enabled for the USB buses....

root@(none):~# INFO: rcu_sched detected stalls on CPUs/tasks:
(detected by 7, t=5253 jiffies, g=4242, c=4241, q=8771)
All QSes seen, last rcu_sched kthread activity 5250 (4294922997-4294917747), jiffies_till_next_fqs=1, root ->qsmask 0x0
run R running task 0 3087 3011 0x00000004
Call Trace:
[c00000005d6f2a70] [c0000000000858d8] .sched_show_task+0xd8/0x164 (unreliable)
[c00000005d6f2af0] [c0000000000c0670] .rcu_check_callbacks+0x8ec/0x8f4
[c00000005d6f2c40] [c0000000000c6cec] .update_process_times+0x54/0x94
[c00000005d6f2cc0] [c0000000000dc6d4] .tick_sched_handle.isra.17+0x5c/0x7c
[c00000005d6f2d50] [c0000000000dc758] .tick_sched_timer+0x64/0xcc
[c00000005d6f2df0] [c0000000000c7ff8] .__run_hrtimer+0xc0/0x2ec
[c00000005d6f2e90] [c0000000000c85a8] .hrtimer_interrupt+0x13c/0x2f8
[c00000005d6f2fa0] [c00000000001217c] .__timer_interrupt+0xa0/0x200
[c00000005d6f3040] [c00000000001261c] .timer_interrupt+0x90/0xc4
[c00000005d6f30c0] [c00000000001c054] exc_0x900_common+0x104/0x108
--- interrupt: 901 at .smp_call_function_many+0x324/0x3b8
LR = .smp_call_function_many+0x2e0/0x3b8
[c00000005d6f33b0] [c0000000000e2f48] .smp_call_function_many+0x2c0/0x3b8 (unreliable)
[c00000005d6f3470] [c00000000002ed48] .__flush_tlb_page+0x140/0x19c
[c00000005d6f3530] [c00000000002c64c] .ptep_set_access_flags+0xb8/0x160
[c00000005d6f35d0] [c00000000018dbb0] .do_wp_page+0x1ac/0x758
[c00000005d6f36d0] [c00000000019078c] .handle_mm_fault+0xab8/0x1198
[c00000005d6f37f0] [c00000000002b874] .do_page_fault+0x398/0x6cc
[c00000005d6f38c0] [c00000000001d1d4] storage_fault_common+0x20/0x44
--- interrupt: 301 at .handle_rt_signal64+0x164/0x440
LR = .handle_rt_signal64+0x9c/0x440
[c00000005d6f3c70] [c00000000000a974] .do_signal+0x17c/0x220
[c00000005d6f3db0] [c00000000000ab98] .do_notify_resume+0x84/0x94
[c00000005d6f3e30] [c000000000000c4c] .ret_from_except_lite+0x78/0x7c
rcu_sched kthread starved for 5289 jiffies!

0 Kudos
Reply

7,416 Views
panjiading
Contributor I

Hi Yiping, 

    Thank you very much  for your respone!

    I will test it with your suggestion.

    But this is a problem that happens about one week to 15 days a time randomly.

    How can I confirm that the problem is solved, 

    Do you have a way that to make this RCU problem occure every time for test?

Best Regard !

Thanks!

0 Kudos
Reply