Hi
I have some problems with the PCIE controller of ls1023a
We link the fpga of a pcie interface to the ls1023(is x1 g1)
When the pcie bus runs for a period of time the problem where the AER driver reports Completion Timeouts" for any PCI memory read access to a certain endpoint device:
[ 1200.937459] pcieport 0001:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[ 1200.945559] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[ 1200.957413] pcieport 0001:00:00.0: device [1957:808a] error status/mask=00004000/00400000
[ 1200.965759] pcieport 0001:00:00.0: [14] Completion Timeout (First)
[ 1200.972619] pcieport 0001:00:00.0: AER: Device recovery failed
[ 1200.981337] pcieport 0001:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[ 1200.989444] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[ 1201.001271] pcieport 0001:00:00.0: device [1957:808a] error status/mask=00004000/00400000
[ 1201.009616] pcieport 0001:00:00.0: [14] Completion Timeout (First)
[ 1201.020172] pcieport 0001:00:00.0: AER: Device recovery failed
[ 1201.053647] pcieport 0001:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[ 1201.061737] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[ 1201.073592] pcieport 0001:00:00.0: device [1957:808a] error status/mask=00004000/00400000
[ 1201.081937] pcieport 0001:00:00.0: [14] Completion Timeout (First)
[ 1201.088819] pcieport 0001:00:00.0: AER: Device recovery failed
[ 1201.097411] pcieport 0001:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[ 1201.105471] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[ 1201.117328] pcieport 0001:00:00.0: device [1957:808a] error status/mask=00004000/00400000
[ 1201.125681] pcieport 0001:00:00.0: [14] Completion Timeout (First)
[ 1201.132564] pcieport 0001:00:00.0: AER: Device recovery failed
[ 1201.140808] pcieport 0001:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[ 1201.170951] drv_fpga_read signal_pending error!
[ 1201.256622] pcieport 0001:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[ 1201.268363] pcieport 0001:00:00.0: device [1957:808a] error status/mask=00004000/00400000
thanks
Solved! Go to Solution.
Hello zz zw,
Please refer to A-008822
Description: By default, when the PCI Express controller experiences an erroneous completion from an external completer for its outbound non-posted request, it always sends an OKAY response to the device’s internal AXI slave system interface. This is desirable for outbound configure transactions to prevent an unnecessary error response from propagating through higher-level system hierarchy, because erroneous completion is a commonly expected behavior during PCI Express bus scan.
However, such default system error response behavior cannot be used for other types of outbound non-posted requests. For example, the outbound memory read transaction requires
an actual ERROR response when experiencing erroneous completion from an external completer, like UR completion or completion timeout.
Impact: The device's higher level system hierarchy cannot detect the error condition when the PCI Express controller experiences an erroneous completion from the external completer for its
outbound non-posted request. This is not the case for configure transactions.
Workaround: Workaround: Write to the PCI Express controller's configure space offset 8D0h with 0000_9401h during the
pre-boot initialization (PBI) process.
Fix plan: No plans to fix
Please refer to patch [2/2] pci/layerscape: change the default error response behavior - Patchwork .
Have a great day,
TIC
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
The reason is that I forgot to connect two clock lines for pcie in my schematic
Hello zz zw,
Please refer to A-008822
Description: By default, when the PCI Express controller experiences an erroneous completion from an external completer for its outbound non-posted request, it always sends an OKAY response to the device’s internal AXI slave system interface. This is desirable for outbound configure transactions to prevent an unnecessary error response from propagating through higher-level system hierarchy, because erroneous completion is a commonly expected behavior during PCI Express bus scan.
However, such default system error response behavior cannot be used for other types of outbound non-posted requests. For example, the outbound memory read transaction requires
an actual ERROR response when experiencing erroneous completion from an external completer, like UR completion or completion timeout.
Impact: The device's higher level system hierarchy cannot detect the error condition when the PCI Express controller experiences an erroneous completion from the external completer for its
outbound non-posted request. This is not the case for configure transactions.
Workaround: Workaround: Write to the PCI Express controller's configure space offset 8D0h with 0000_9401h during the
pre-boot initialization (PBI) process.
Fix plan: No plans to fix
Please refer to patch [2/2] pci/layerscape: change the default error response behavior - Patchwork .
Have a great day,
TIC
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
is this patch can fix the issue?
patch [2/2] pci/layerscape: change the default error response behavior - Patchwork .
My LS1088A card has encounter the similar issue after it runs for a period:
[ 2551.253634] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0000(Transmitter ID)
[ 2551.253638] pcieport 0000:00:00.0: device [1957:80c0] error status/mask=00001081/00006000
[ 2551.253640] pcieport 0000:00:00.0: [ 0] Receiver Error
[ 2551.253643] pcieport 0000:00:00.0: [ 7] Bad DLLP
[ 2551.253646] pcieport 0000:00:00.0: [12] Replay Timer Timeout
[ 2551.253650] pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
[ 2551.253659] pcieport 0000:00:00.0: can't find device of ID0000
So I am interested in the further test result of LS1023A after Linux is patched.
Because I found my LSDK has include this patched already but it has the same issue, so I was wondering if this patch can fix such issue at all?
Thanks