On a t1042 target (linux), i talk to a pci EP. This device is a bridge to legacy, old school bus. It may generate PCI target abort in reaction of Bus ERRor on the underlying bus. This has to be handled in my application. All PCI transaction to this device are not posted in my context.
When it occurs, the core (on which pci device driver was running) freeze on the load instruction (load that ended in the PCI Target Abort).
I ve found many discussion about what look like similar issues on P2020, mpc85xx, etc.
And some Linux kernel patch trying to handle that:
And this (never accepted) patch (it did not help me)
Following this one:
Does anybody known if the mpc85xx's erratum mentionned in this last link is still applicable to e5500 ?
What I can see is that fsl pci edac driver interrupt handler run as the "PEX pcie RC logic" detect the target abort. So it seems to have a good behavior from the pci device EP to the t1042 SoC (through pcie switch and pcie/pci bridge)
I don't think any machine check exception is run at all. Because I don't see anything in the console, even after spreading printk() in the revelant function in arch/powerpc/kernel/traps.c.
And I read 0 for all core in the /proc/interrupts for the "machine check" entry.
But sometime I get a "bad kernel stack pointer" in the console, that is weird because it must come from exception handling.
Does someone understand if it is all about a known issue ?
is there any linked errata that apply for all e500 family including e5500 ?
Problem occurs with linux-4.1.8, linux-4.19.1, (for exemple) with or without fsl mpc85xx EDAC and AER drivers active.
The cpu is reported stall by other core in the console. No more jiffies count added for the freezed core in /proc/stat.
What i expect woulb be that a dedicated exception handler stop the load instruction, so that the core resume and application can deal with the problem.