PCIe errors causes CPU to crash

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

PCIe errors causes CPU to crash

4,116件の閲覧回数
emilviding
Contributor I

Hi,

I have a new question that's related to a PCIe Master Abort question I asked a while ago (unanswered).

We have a situation where a PCIe endpoint resets unexpectedly and when our P4080 (that's RC) tries to read something from that endpoint we get completion Time outs and Acknowledge time out (not sure why, isn't it enough with the CTO?).
If we stop with the accesses at that point the CPU can continue to execute but if we don't we get a machine check and the CPU crashes.

It's like the CPU/PCIe controller only can handle a few errors.
Is it like that or what is it we're seeing?

Are we expected to do something else?

Regards,

Emil Viding

ラベル(2)
0 件の賞賛
返信
1 返信

3,181件の閲覧回数
ufedor
NXP Employee
NXP Employee

The Completion Timeout (CTO) is a PCIe uncorrectable non-fatal error.

Non-Fatal Errors cause a particular transaction to be unreliable. but the link is otherwise fully functional. This provides related HW or SW an opportunity to recover the error without resetting the components on the link and disturbing other transactions in progress. Nevertheless, no data is returned (because CoreNet, marks the transaction result with “Bad Data” label) - i.e. core can’t finish the related load instruction.

In this case, the core takes a Synchronous Error Report Machine Check Interrupt:

- The Save/Restore Register contains pointer to this load instruction, in other words, the MCSRR0 contains the address of this load instruction

- The MCSR [LD] bit is set to reflect this situation

 

Further error handling is application-dependent.

Additional detail can be provided in a Technical Case:

https://community.nxp.com/thread/381898

0 件の賞賛
返信