AnsweredAssumed Answered

T1042 e5500 pci target abort freeze cpu core

Question asked by Antoine Durand on Nov 12, 2018
Latest reply on Nov 19, 2018 by Antoine Durand

Hi,

 

On a t1042 target (linux), i talk to a pci EP. This device is a bridge to legacy, old school bus. It may generate PCI target abort in reaction of Bus ERRor on the underlying bus. This has to be handled in my application. All PCI transaction  to this device are not posted in my context.

 

When it occurs, the core (on which pci device driver was running) freeze on the load instruction (load that ended in the PCI Target Abort).

 

I ve found many discussion about what look like similar issues on P2020, mpc85xx, etc.

And some Linux kernel patch trying to handle that:

 

Discussion :

 

PCIe errors causes CPU to crash 

 

Freescale P2020 CPU Freeze over PCIe abort signal 

 

And this (never accepted) patch (it did not help me)

 

powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500 - Patchwork 

 

Following this one:

 

[2/2,V8] powerpc/85xx: Add machine check handler to fix PCIe erratum on mpc85xx - Patchwork 

 

Does anybody known if the mpc85xx's erratum mentionned in this last link is still applicable to e5500 ?

 

What I can see is that fsl pci edac driver interrupt handler run as the "PEX pcie RC logic" detect the target abort. So it seems to have a good behavior from the pci device EP to the t1042 SoC (through pcie switch and pcie/pci bridge)

 

I don't think any machine check exception is run at all. Because I don't see anything in the console, even after spreading printk() in the revelant function in arch/powerpc/kernel/traps.c.

And I read 0 for all core in the /proc/interrupts for the "machine check" entry.

 

But sometime I get a "bad kernel stack pointer" in the console, that is weird because it must come from exception handling.

 

Does someone understand if it is all about a known issue ?

is there any linked errata that apply for all e500 family including e5500 ?

 

Problem occurs with linux-4.1.8, linux-4.19.1, (for exemple) with or without fsl mpc85xx EDAC and AER drivers active. 

 

Other symptoms:

The cpu is reported stall by other core in the console. No more jiffies count added for the freezed core in /proc/stat.

 

What i expect woulb be that a dedicated exception handler stop the load instruction, so that the core resume and application can deal with the problem.

 

Regards.

 

Thanks

Outcomes