T1042 PCI access error after reboot

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

T1042 PCI access error after reboot

1,238 Views
moqu
Contributor I

hi,

   we have a board use T1042 CPU with below issue.

CPU connect 4 PCI device, PCI1 device have below issue

after power on pci1 access is ok. use lspci or mm(memory access tool)

use mm tool read/write PCI device address space is ok.

but after reboot, the other 3 pci device is ok, access the same address of pci1 device error as below,

TPS24 login: root
Password:
Last login: Tue Dec  3 09:11:04 UTC 1929 on console
root@TPS24:~# mm --rdl 0xc01010000
0x0000000C01010000:  E9504117                              
root@TPS24:~# mm --rdl 0xc01010000
0x0000000C01010000:  E9504117                              
root@TPS24:~# mm --rdl 0xc01010000
[   34.432161] Machine check in kernel mode.
[   34.444155] Caused by (from MCSR=a000): Load Error Report
[   34.460302] Guarded Load Error Report
[   34.471245] Oops: Machine check, sig: 7 [#1]
[   34.484005] SMP NR_CPUS=4 CoreNet Generic
[   34.495990] Modules linked in: linx_eth_cm linx
[   34.509550] CPU: 2 PID: 2505 Comm: mm Not tainted 4.1.21-rt13-WR8.0.0.25_standard+ #"V01.00.03"
[   34.535588] task: c0000000f2536b50 ti: c00000007f5f0000 task.ti: c00000007f5f0000
[   34.557980] NIP: 000000001000103c LR: 0000000010000ff0 CTR: 00003fff9794e620
[   34.579072] REGS: c00000007f5f3ea0 TRAP: 0000   Not tainted  (4.1.21-rt13-WR8.0.0.25_standard+)
[   34.605106] MSR: 000000008002d000 <CE,EE,PR,ME>  CR: 48000442  XER: 00000000
[   34.626224] SOFTE: 1
GPR00: 0000000010000ff0 00003fffc7dd0b10 000000001001a400 0000000000000015
GPR04: 0000000000000000 0000000000000003 00003fff979c9012 000000007fffffff
GPR08: 0000000000000000 00003fff979ca000 0000000000000000 0000000000000000
GPR12: 0000000042000244 00003fff979cb750 0000000010128668 0000000028222482
GPR16: 000000001011217c 00000000100f1560 0000000000000004 0000000000000000
GPR20: 000000001012c468 0000000010255ef0 000000001013af90 0000000010001e38
GPR24: 0000000010001e30 0000000010001e28 00003fff979ca000 0000000000000001
GPR28: 0000000000000000 0000000000000004 0000000000000004 0000000000000000
[   34.793500] NIP [000000001000103c] 0x1000103c
[   34.806523] LR [0000000010000ff0] 0x10000ff0
[   34.819284] Call Trace:
[   34.826581] ---[ end trace 4a926fcc90f4c778 ]---
[   34.840383]
[   36.818151] Kernel panic - not syncing: Fatal exception
[   36.833781] Rebooting in 1 seconds..

what issue about it? how to find the root case.  

Labels (1)
0 Kudos
1 Reply

899 Views
yipingwang
NXP TechSupport
NXP TechSupport

Hello mo qu,

Please build EDAC Driver into Linux Kernel, the EDAC kernel module's goal is to detect and report errors that occur within the computer system running under Linux.

Device Drivers --->
           <*> EDAC (Error Detection And Correction) reporting --->
                       <*> Main Memory EDAC (Error Detection And Correction) reporting

                       <*> Freescale MPC83xx / MPC85xx

Please configure the following in Linux Kernel configuration file.

CONFIG_EDAC_MM_EDAC=y

CONFIG_EDAC_MPC85XX=y

You will get Kernel boot message similar as the following.

EDAC PCI2: Giving out device to module 'MPC85xx_edac' controller 'mpc85xx_pci_err': DEV
'ffe202000.pcie' (INTERRUPT)
MPC85xx_edac acquired irq 16 for PCI Err
MPC85xx_edac PCI err registered
Testing edac driver is start.
PCIE error(s) detected
PCIE ERR_DR register: 0x00020000
PCIE ERR_CAP_STAT register: 0x80000001
PCIE ERR_CAP_R0 register: 0x00000800
PCIE ERR_CAP_R1 register: 0x00000000
PCIE ERR_CAP_R2 register: 0x00000000
PCIE ERR_CAP_R3 register: 0x00000000

In addition, please enable PCIe Advanced Error Reporting(AER) function.

Please configure "CONFIG_PCIEAER=y" to enable AER in Kernel configuration file, rebuild Linux Kernel and check whether you could get the error report similar as the following.

example of error report as below:
pcieport 0000:00:00.0: AER: Corrected error received: id=0100
e1000e 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0100(Receiver
ID)
e1000e 0000:01:00.0: device [8086:10d3] error status/mask=00000040/00002000
e1000e 0000:01:00.0: [ 6] Bad TLP

Thanks,

Yiping

0 Kudos