T1042 machine check

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

T1042 machine check

2,413 Views
dominiqueletty
Contributor I

Hello,

I'm working on a T1042 design under llinux (3.12.19). The T1042 device is connected to several PCIe devices.

About once a day, my application crash with the following traces:

Machine check in kernel mode.
Caused by (from MCSR=a000): Load Error Report
Guarded Load Error Report
Oops: Machine check, sig: 7 [#1]
PREEMPT SMP NR_CPUS=24
Modules linked in:
CPU: 0 PID: 1402 Comm: h_im_ti Not tainted 3.12.19-IC #6
task: c0000000792e4b00 ti: c00000006c11c000 task.ti: c00000006c11c000
NIP: 00003fff7b2c6aa8 LR: 00003fff7b2c6a6c CTR: 00003fff7b0be7e0
REGS: c00000006c11fea0 TRAP: 0000 Not tainted (3.12.19-IC)
MSR: 000000008002d000 <CE,EE,PR,ME> CR: 44004822 XER: 20000000
SOFTE: 1

GPR00: 0000000019470092 00003fff42c0e2f0 00003fff7b37fd30 0000000019470092
GPR04: 0000000000000000 0000000000000012 0000000019470098 0000000000000006
GPR08: ffffc000bd3f1c88 000000000000002a 00000000009c7950 0000000019470092
GPR12: 00003fff42c0e378 00003fff42c16910 0000000000000000 0000000000000000
GPR16: 0000000000000003 00003fff42410000 00003fff60000a68 00003fff7088c010
GPR20: 00003fff7b373f80 00003fff7bb29268 0000000000800000 00003fff7ba010e8
GPR24: 0000000000001000 0000000000000000 0000000000000000 00003fff6fae85c0
GPR28: 00003fff7b9fb008 00003fff42c0f618 00003fff42c16910 00003fff42c0e2f0
NIP [00003fff7b2c6aa8] 0x3fff7b2c6aa8
LR [00003fff7b2c6a6c] 0x3fff7b2c6a6c
Call Trace:
---[ end trace bc5cd3841dd26689 ]---

[sched_delayed] sched: RT throttling activated
Machine check in kernel mode.
Caused by (from MCSR=a000): Load Error Report
Guarded Load Error Report
Oops: Machine check, sig: 7 [#2]
PREEMPT SMP NR_CPUS=24
Modules linked in:
CPU: 0 PID: 1359 Comm: l_rx2 Tainted: G D 3.12.19-IC #6
task: c000000079542b00 ti: c00000006c058000 task.ti: c00000006c058000
NIP: 000000006109c418 LR: 000000006109f7d0 CTR: 000000006109bd48
REGS: c00000006c05bea0 TRAP: 0000 Tainted: G D (3.12.19-IC)
MSR: 000000008002d000 <CE,EE,PR,ME> CR: 28004442 XER: 00000000
SOFTE: 1

GPR00: 000000001947cc0c 00003fff57ffe170 0000000063320df8 0000000000000000
GPR04: 000000001947cc0c 00003fff57ffe698 00003fff57ffe600 00003fff57ffe608
GPR08: 00003fff57ffe698 0000000000000001 00000000648876b8 000000001947cc0c
GPR12: 0000000000000016 00003fff58006910 0000000000000000 0000000000000000
GPR16: 0000000000000003 00003fff57800000 00000000003d89a8 00003fff716c0290
GPR20: 00003fff7b67e6a8 00003fff7bb29268 0000000000800000 00003fff7ba010e8
GPR24: 0000000000001000 0000000000000000 0000000000000000 00003fff702e8520
GPR28: 00003fff7b9fb008 00003fff57fff618 0000000063334330 00003fff57ffe170
NIP [000000006109c418] 0x6109c418
LR [000000006109f7d0] 0x6109f7d0
Call Trace:

In the following document:

"Understanding Interrupt Sources and Error Handling
Procedures for the QorIQ P4080 Multicore Processor"

Ted Peters details the machine check error:

►Guarded load instruction
• Set along with MCSR[LD] if error is on a guarded load
• Set if error occurs is an L2 or CoreNet error (no L1 data cache error)

Is there a method to debug machine check on T1042  or to find the root cause of this machine check ?

Thanks you.

Dominique.

0 Kudos
2 Replies

1,811 Views
yipingwang
NXP TechSupport
NXP TechSupport

Hello Dominique LETTY,

The machine check interrupt may occur because the target address is invalid or the target is not responding.


NIP is the Next Instruction Pointer or generically the Program Counter indicates where the Kernel oops. You should be able to use objdump -S to find out what instruction is at NIP.

The Link Register (LR) holds the return address of the current function and indicates the caller of the function where the instruction at NIP is in.

Please refer to System.map in your Linux Kerne build environment to find out the function containing the address LR, it seems that NIP and LR point to the invalid instruction addresses.

Would you please provide the whole Kernel call trace information?

I suspect this problem is caused by your unstable hardware.

Would you please download SDK 2.0 pre_built image ISO for E5500?

Please boot up your target board with the Kernel image provided in this ISO, then run your application to check whether this problem also can be reproduced.


Have a great day,
TIC

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

1,811 Views
dominiqueletty
Contributor I

Hello Yiping Wang,

sorry for the late answer. I think I have found my problem. It was caused by an alignment error on a PCIe DMA from an external Ethernet controller to the CPU memory.

Thanks for your help.

Regards,

Dominique.

0 Kudos