AnsweredAssumed Answered

Trying to understand PCI master abort

Question asked by Ian Stedman on Sep 13, 2017

Hi,

I'm working on a problem with the MPC8548E processor. We've had some seemingly random lockups of the CPU, this was caused by the HID1[RFXE] bit being set and the machine check interrupt being activated. Disabling this, the card(s) work but we are trying to understand what caused the fault.

 

We have a vxWorks operating system and using one of the PCI utilities, get this:

vxbPciConfigTopoShow pciID

[0,0,0] type=PROCESSOR
 status=0x20b0 ( CAP 66MHZ FBTB DEVSEL=0 MSTR_ABORT_RCV )
 command=0x0006 ( MEM_ENABLE MASTER_ENABLE )
 bar0 in 32-bit mem space @ 0x40000000
 bar1 in prefetchable 32-bit mem space @ 0x00000000
 bar2 in 64-bit mem space @ 0x00000000
 bar4 in 64-bit mem space @ 0x00000000
[0,17,0] type=PERIPHERAL
 status=0x0200 ( DEVSEL=1 )
 command=0x0002 ( MEM_ENABLE )
 bar0 in 32-bit mem space @ 0x64000000
[0,18,0] type=UNKNOWN (0x80) BRIDGE
 status=0x0200 ( DEVSEL=1 )
 command=0x0007 ( IO_ENABLE MEM_ENABLE MASTER_ENABLE )
 bar0 in I/O space @ 0x68000000
 bar1 in 32-bit mem space @ 0x64100000


The MSTR_ABORT_RCV  bit concerns me. The CPU has aborted a read transaction for some reason. I suspect this sometimes fails catastrophically, hence the HID1[RFXE] bit captures the fault with a machine check.

 

Delving into the PCI registers, I captured the result of the ERR_ATTRIB register, which had a value of 0x001FA001. The decode does not make complete sense as the Error Source is reserved. The PCI command (0xA) was a config read.

 

If I clear the Master Abort bit in the status register of the CPU and check status 1 second later, the fault has re-appeared.

 

Enough rambling, have I interpreted the ERR_ATTRIB register correctly and has anyone else had issues with the PCI interface on the MPC8548E or similar?

Outcomes