P1022: PCIEs error detected, console throws continuous error

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
已解决

P1022: PCIEs error detected, console throws continuous error

跳至解决方案
3,484 次查看
hemwant
Contributor IV
We are having a customized P1022 board with an EP device connected via PCI interface. After booting up and running the software stack of EP device, the console throws continuous error dumps of PCIE.
I am attaching the logs of console with this thread.
Following are the queries-

1) Can we avoid the CPU to go into inconsistent state in case of PCI Express completion time-out. As per our understanding The CPU has to report the PCIe error and continue its process.We need the Host CPU in running state to further debug and find out the root cause of the problem at framer device.

2) We can't figure out the root cause of this issue, is it a software bug or hardware bug? Could you please give some insight regarding resolving the above PCIe Issue?

0 项奖励
回复
1 解答
2,860 次查看
hemwant
Contributor IV

@yipingwangAfter debugging and live session with the framer device vendor, it was found out that SDK consists of  invalid read/write initiated on PCIE registers which were not supported by vendor. It was coding bug in vendor SDK.

 

Thanks for valuable support

 

 

在原帖中查看解决方案

7 回复数
2,413 次查看
RachelGomez123
Contributor I

Steps to fix-

Go into the BIOS.
Click the Advanced Menu.
Afterward, choose Slot Settings.
There will be a setting named PCI SERR# Generation. 
Change it from 'Enable' to 'Disable'. 
Save and Exit.
Now reboot the system and check if the error persists.

 

Regards,

Rachel Gomez

0 项奖励
回复
3,251 次查看
yipingwang
NXP TechSupport
NXP TechSupport
 
Can you share the logs?
We also want the PCIe register dumps, both config space and memory mapped.
Use HID1[RFXE]=0 to disable machine check generation by e500 core. Details are available in P1022 Reference manual.
0 项奖励
回复
3,234 次查看
hemwant
Contributor IV

 

@yipingwangI am attaching the error logs. Further we are running the board in chassis, so it is impossible to get pcie error dumps in such a case as processor is in hang state.

0 项奖励
回复
3,108 次查看
yipingwang
NXP TechSupport
NXP TechSupport

We have analyzed your log file. It frequently showed a completion with CA status has been detected while reading the configuration space of the endpoint through PEX_CONFIG_ADDR/PEX_CONFIG_DATA. This condition happens when the completer receives a request that cannot be completed due to a violation of the completer's programming model.

 

We need some more information from the customer to analyze the issue further.

1) Who is the link partner in the customer's setup?

2) Please provide the PCIe dump before the crash.

3) As per the customer's earlier response, they observed the machine check. However, we do not see the machine check in the log. Please share the complete log.

4) What tests were running on the setup? At which point did the error occur?

5) To take the PCIe register dump after the crash, try disabling the PCI Express CA completion in the PCI Express error disable register.

0 项奖励
回复
3,096 次查看
hemwant
Contributor IV

Please find inline reply for your queries-

1) Who is the link partner in the customer's setup?

Link partner is End point device from microsemi.

2) Please provide the PCIe dump before the crash.

We can't take pcie dump before crash as we donot know when the dump comes.If possible we can take pcie dump just after booting the processor.

3) As per the customer's earlier response, they observed the machine check. However, we do not see the machine check in the log. Please share the complete log.

no machine check was observed in this case, only pcie dump was there and after some time processor goes into hung state.

4) What tests were running on the setup? At which point did the error occur?

We were configuring our end point device, the error comes randomly while accessing the device.

5) To take the PCIe register dump after the crash, try disabling the PCI Express CA completion in the PCI Express error disable register.

This settings needs to be disabled in uboot, or dynamically we have to modify kernel to disable CA completion error when PCIE register dump is detected.

0 项奖励
回复
3,064 次查看
yipingwang
NXP TechSupport
NXP TechSupport

2) We can't take pcie dump before crash as we donot know when the dump comes.If possible we can take pcie dump just after booting the processor.
[NXP] Take a dump just after loading the kernel. We would like to see the PCIe configurations when things are working.

5) This settings needs to be disabled in uboot, or dynamically we have to modify kernel to disable CA completion error when PCIE register dump is detected.
[NXP] It does not matter if you do it in U-Boot or in the kernel. Make sure you modify it before doing your test on end-point.

0 项奖励
回复
2,861 次查看
hemwant
Contributor IV

@yipingwangAfter debugging and live session with the framer device vendor, it was found out that SDK consists of  invalid read/write initiated on PCIE registers which were not supported by vendor. It was coding bug in vendor SDK.

 

Thanks for valuable support