1) Can we avoid the CPU to go into inconsistent state in case of PCI Express completion time-out. As per our understanding The CPU has to report the PCIe error and continue its process.We need the Host CPU in running state to further debug and find out the root cause of the problem at framer device.
2) We can't figure out the root cause of this issue, is it a software bug or hardware bug? Could you please give some insight regarding resolving the above PCIe Issue?
已解决! 转到解答。
@yipingwangAfter debugging and live session with the framer device vendor, it was found out that SDK consists of invalid read/write initiated on PCIE registers which were not supported by vendor. It was coding bug in vendor SDK.
Thanks for valuable support
Steps to fix-
Go into the BIOS.
Click the Advanced Menu.
Afterward, choose Slot Settings.
There will be a setting named PCI SERR# Generation.
Change it from 'Enable' to 'Disable'.
Save and Exit.
Now reboot the system and check if the error persists.
Regards,
Rachel Gomez
We have analyzed your log file. It frequently showed a completion with CA status has been detected while reading the configuration space of the endpoint through PEX_CONFIG_ADDR/PEX_CONFIG_DATA. This condition happens when the completer receives a request that cannot be completed due to a violation of the completer's programming model.
We need some more information from the customer to analyze the issue further.
1) Who is the link partner in the customer's setup?
2) Please provide the PCIe dump before the crash.
3) As per the customer's earlier response, they observed the machine check. However, we do not see the machine check in the log. Please share the complete log.
4) What tests were running on the setup? At which point did the error occur?
5) To take the PCIe register dump after the crash, try disabling the PCI Express CA completion in the PCI Express error disable register.
Please find inline reply for your queries-
1) Who is the link partner in the customer's setup?
Link partner is End point device from microsemi.
2) Please provide the PCIe dump before the crash.
We can't take pcie dump before crash as we donot know when the dump comes.If possible we can take pcie dump just after booting the processor.
3) As per the customer's earlier response, they observed the machine check. However, we do not see the machine check in the log. Please share the complete log.
no machine check was observed in this case, only pcie dump was there and after some time processor goes into hung state.
4) What tests were running on the setup? At which point did the error occur?
We were configuring our end point device, the error comes randomly while accessing the device.
5) To take the PCIe register dump after the crash, try disabling the PCI Express CA completion in the PCI Express error disable register.
This settings needs to be disabled in uboot, or dynamically we have to modify kernel to disable CA completion error when PCIE register dump is detected.
2) We can't take pcie dump before crash as we donot know when the dump comes.If possible we can take pcie dump just after booting the processor.
[NXP] Take a dump just after loading the kernel. We would like to see the PCIe configurations when things are working.
5) This settings needs to be disabled in uboot, or dynamically we have to modify kernel to disable CA completion error when PCIE register dump is detected.
[NXP] It does not matter if you do it in U-Boot or in the kernel. Make sure you modify it before doing your test on end-point.
@yipingwangAfter debugging and live session with the framer device vendor, it was found out that SDK consists of invalid read/write initiated on PCIE registers which were not supported by vendor. It was coding bug in vendor SDK.
Thanks for valuable support