PCIe errors causes CPU to crash

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

PCIe errors causes CPU to crash

2,748 Views
emilviding
Contributor I

Hi,

I have a new question that's related to a PCIe Master Abort question I asked a while ago (unanswered).

We have a situation where a PCIe endpoint resets unexpectedly and when our P4080 (that's RC) tries to read something from that endpoint we get completion Time outs and Acknowledge time out (not sure why, isn't it enough with the CTO?).
If we stop with the accesses at that point the CPU can continue to execute but if we don't we get a machine check and the CPU crashes.

It's like the CPU/PCIe controller only can handle a few errors.
Is it like that or what is it we're seeing?

Are we expected to do something else?

Regards,

Emil Viding

Labels (2)
0 Kudos
1 Reply

1,813 Views
ufedor
NXP Employee
NXP Employee

The Completion Timeout (CTO) is a PCIe uncorrectable non-fatal error.

Non-Fatal Errors cause a particular transaction to be unreliable. but the link is otherwise fully functional. This provides related HW or SW an opportunity to recover the error without resetting the components on the link and disturbing other transactions in progress. Nevertheless, no data is returned (because CoreNet, marks the transaction result with “Bad Data” label) - i.e. core can’t finish the related load instruction.

In this case, the core takes a Synchronous Error Report Machine Check Interrupt:

- The Save/Restore Register contains pointer to this load instruction, in other words, the MCSRR0 contains the address of this load instruction

- The MCSR [LD] bit is set to reflect this situation

 

Further error handling is application-dependent.

Additional detail can be provided in a Technical Case:

https://community.nxp.com/thread/381898

0 Kudos