AnsweredAssumed Answered

T1042 machine check

Question asked by Dominique LETTY on Aug 10, 2018
Latest reply on Oct 19, 2018 by Dominique LETTY

Hello,

 

I'm working on a T1042 design under llinux (3.12.19). The T1042 device is connected to several PCIe devices.

About once a day, my application crash with the following traces:

 

Machine check in kernel mode.
Caused by (from MCSR=a000): Load Error Report
Guarded Load Error Report
Oops: Machine check, sig: 7 [#1]
PREEMPT SMP NR_CPUS=24
Modules linked in:
CPU: 0 PID: 1402 Comm: h_im_ti Not tainted 3.12.19-IC #6
task: c0000000792e4b00 ti: c00000006c11c000 task.ti: c00000006c11c000
NIP: 00003fff7b2c6aa8 LR: 00003fff7b2c6a6c CTR: 00003fff7b0be7e0
REGS: c00000006c11fea0 TRAP: 0000 Not tainted (3.12.19-IC)
MSR: 000000008002d000 <CE,EE,PR,ME> CR: 44004822 XER: 20000000
SOFTE: 1

GPR00: 0000000019470092 00003fff42c0e2f0 00003fff7b37fd30 0000000019470092
GPR04: 0000000000000000 0000000000000012 0000000019470098 0000000000000006
GPR08: ffffc000bd3f1c88 000000000000002a 00000000009c7950 0000000019470092
GPR12: 00003fff42c0e378 00003fff42c16910 0000000000000000 0000000000000000
GPR16: 0000000000000003 00003fff42410000 00003fff60000a68 00003fff7088c010
GPR20: 00003fff7b373f80 00003fff7bb29268 0000000000800000 00003fff7ba010e8
GPR24: 0000000000001000 0000000000000000 0000000000000000 00003fff6fae85c0
GPR28: 00003fff7b9fb008 00003fff42c0f618 00003fff42c16910 00003fff42c0e2f0
NIP [00003fff7b2c6aa8] 0x3fff7b2c6aa8
LR [00003fff7b2c6a6c] 0x3fff7b2c6a6c
Call Trace:
---[ end trace bc5cd3841dd26689 ]---

[sched_delayed] sched: RT throttling activated
Machine check in kernel mode.
Caused by (from MCSR=a000): Load Error Report
Guarded Load Error Report
Oops: Machine check, sig: 7 [#2]
PREEMPT SMP NR_CPUS=24
Modules linked in:
CPU: 0 PID: 1359 Comm: l_rx2 Tainted: G D 3.12.19-IC #6
task: c000000079542b00 ti: c00000006c058000 task.ti: c00000006c058000
NIP: 000000006109c418 LR: 000000006109f7d0 CTR: 000000006109bd48
REGS: c00000006c05bea0 TRAP: 0000 Tainted: G D (3.12.19-IC)
MSR: 000000008002d000 <CE,EE,PR,ME> CR: 28004442 XER: 00000000
SOFTE: 1

GPR00: 000000001947cc0c 00003fff57ffe170 0000000063320df8 0000000000000000
GPR04: 000000001947cc0c 00003fff57ffe698 00003fff57ffe600 00003fff57ffe608
GPR08: 00003fff57ffe698 0000000000000001 00000000648876b8 000000001947cc0c
GPR12: 0000000000000016 00003fff58006910 0000000000000000 0000000000000000
GPR16: 0000000000000003 00003fff57800000 00000000003d89a8 00003fff716c0290
GPR20: 00003fff7b67e6a8 00003fff7bb29268 0000000000800000 00003fff7ba010e8
GPR24: 0000000000001000 0000000000000000 0000000000000000 00003fff702e8520
GPR28: 00003fff7b9fb008 00003fff57fff618 0000000063334330 00003fff57ffe170
NIP [000000006109c418] 0x6109c418
LR [000000006109f7d0] 0x6109f7d0
Call Trace:

 

In the following document:

"Understanding Interrupt Sources and Error Handling
Procedures for the QorIQ P4080 Multicore Processor"

 

Ted Peters details the machine check error:

►Guarded load instruction
• Set along with MCSR[LD] if error is on a guarded load
• Set if error occurs is an L2 or CoreNet error (no L1 data cache error)

 

Is there a method to debug machine check on T1042  or to find the root cause of this machine check ?

 

Thanks you.

 

Dominique.

Outcomes