Freescale P2020 CPU Freeze over PCIe abort signal

Liberty · ‎10-26-2010

This is the head of a thread I started in Linux mailing lists. The thread can be found at: http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg47595.html --------------------snip -------------------- This should probably go to the Freescale support, as it feels like a hardware issue yet the end result is a very frozen Linux kernel so I post here first... I have a programmable FPGA PCIe device connected to a Freescale's P2020 PCIe port. As part of the bring-up tests, we are testing two faulty scenarios: 1. The FPGA totally ignores the PCIe transaction. 2. The FPGA return a transaction abort. Both are plausible PCIe behavior and their should be outcome is documented in the PCIe spec. The first should be terminated by the transaction requestor timeout mechanism and raise an error, the second should abort the transaction and raise and error. In P2020 if I do any of those the CPU is left hung over the transaction. --------------------snap -------------------- It all boils down to two PCIe register --------------------snip -------------------- Disabling PEX_OTB_CPL_TOR, PEX_CONF_RTY_TOR, or both yields the same behavior. The kernel freezes over the load command while the underlying hardware does PCIe transaction retries to infinity and beyond. --------------------snap -------------------- -- Liberty

drdo_org · ‎02-24-2011

We have designed a SBC with MPC8548 PCIe 4x lanes connected with IDT PCIe swicth.My POR of processor is showing the correct configuration as 4x with 2.5Gbps,RC mode.But when i read the PCIe LTSSM register it shows lane is in Detect.quiet.when i probed the actual signals Processor side -Tx diff.voltage of one lane(Lane 0) shows 700mv pp and it is reaching the Rx.Remaining all other lane voltages are 0mv.Similarly,Receiver side -TX,only one lane diff.voltage is showing 112mv pp,remainig all other lanes are showing 0mv.what is the expected voltage levels?.The negotiated link width is showing 1x.Is it due to the same reason?what could be reason for the detect.quiet state?

morel_hunter1 · ‎03-13-2011

There are errata on several of the MPC85xx family parts related to PEX, including a link down situation. Get the errata from Freescale for the part you are using. Also keep in mind that the PEX block is probably the same in many of these parts, but the errata may not be listed for your particular device.

I’ve encountered the PEX (PCIe) link down problem on the MPC8536 when the root complex departs then returns much like a hot swap of a PEX card. Errata offers a work around, which may solve your problem.

Errata title: PCI Express LTSSM may fail to properly train with a link partner following HRESET

http://cache.freescale.com/files/32bit/doc/errata/MPC8548ECE.pdf?fsrch=1&sr=18

drdo_org · ‎03-16-2011

Hi,

As u suggested we have seen the LTSSM register and also,we have sorted out the switch configuration.Now the link is up,but for 2x,but we have configured for 4x.

1.The processor enumerates all the 8 ports inside the swicth(89ES32NT8AG2) and further it is not detecting the devices connected to the ports.

2.Whenever i give RESET the link goes down...if i swicth off and swicth ON board then the link is up...what could be reason?

Lakshmi Srinivasan

fredsky · ‎12-01-2010

Well, good luck.

I have the same problem on MPC8314 platform. It was also a fpga connected through pcie. When we were debugging our fpga code we saw that if a device doesn't reply to a request, the cpu totally hang (non recoverable machine check). That behaviour is totally unacceptable but Freescale never really answer to our request. So far, we are lucky because it seems our fpga code is as robust as other devices connected on PCIE, yet we still have some fear that something could go wrong and lock pcie.

BTW the correct behaviour would be as described in PCIE spec to return all 1's on error on CPU bus with PCIE core handling properly timeout and abort.

Best regards,

Fred.

Liberty · ‎02-27-2011

The issue has been corroborated by someone who chose to address me via my LinkedIn account (of all places)...

---------------------------- snip ----------------------------

Unfortunately my system is not based on Linux rather on propriety RTOS. I have a USB IP on an FPGA board which is connected to P2020 through PCIe interface. What I am seeing is that after sometime my processor freezes on 'lwz' instruction while trying to read a USB controller register (which is mapped through PCIe). If I skip this 'lwz' instruction and jump to the next one then my processor steps cleanly unless I hit another register read operation, it is worth mentioning that writing the register does not stuck (although I guess data is never actually written on intended register). This situation persists unless I reset P2020 and FPGA board. I don't have the ability to trace PCIe bus signals to make it sure either it is a PCIe abort scenario or something else. I am getting similar situation on MPC8544 board which has an identical PCIe subsystem to P2020.

---------------------------- snap ----------------------------

What I would really like to have now (by priority) is:

Freescale acknowledgment for the issue.
Are there any other pits in the vicinity of this issue that I should be aware about?
Is there any workaround.
In which chip version this issue is solves / when its is scheduled to be solved?

– Liberty

Liberty · ‎10-26-2010

Sorry for the poor formating ( where have all the newlines gone? )

It will be easier viewing the original thread here:

http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg47595.html