P1022: PCIEs error detected, console throws continuous error

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

P1022: PCIEs error detected, console throws continuous error

Jump to solution
2,772 Views
hemwant
Contributor IV
We are having a customized P1022 board with an EP device connected via PCI interface. After booting up and running the software stack of EP device, the console throws continuous error dumps of PCIE.
I am attaching the logs of console with this thread.
Following are the queries-

1) Can we avoid the CPU to go into inconsistent state in case of PCI Express completion time-out. As per our understanding The CPU has to report the PCIe error and continue its process.We need the Host CPU in running state to further debug and find out the root cause of the problem at framer device.

2) We can't figure out the root cause of this issue, is it a software bug or hardware bug? Could you please give some insight regarding resolving the above PCIe Issue?

0 Kudos
1 Solution
2,148 Views
hemwant
Contributor IV

@yipingwangAfter debugging and live session with the framer device vendor, it was found out that SDK consists of  invalid read/write initiated on PCIE registers which were not supported by vendor. It was coding bug in vendor SDK.

 

Thanks for valuable support

 

 

View solution in original post

7 Replies
1,701 Views
RachelGomez123
Contributor I

Steps to fix-

Go into the BIOS.
Click the Advanced Menu.
Afterward, choose Slot Settings.
There will be a setting named PCI SERR# Generation. 
Change it from 'Enable' to 'Disable'. 
Save and Exit.
Now reboot the system and check if the error persists.

 

Regards,

Rachel Gomez

0 Kudos
2,539 Views
yipingwang
NXP TechSupport
NXP TechSupport
 
Can you share the logs?
We also want the PCIe register dumps, both config space and memory mapped.
Use HID1[RFXE]=0 to disable machine check generation by e500 core. Details are available in P1022 Reference manual.
0 Kudos
2,522 Views
hemwant
Contributor IV

 

@yipingwangI am attaching the error logs. Further we are running the board in chassis, so it is impossible to get pcie error dumps in such a case as processor is in hang state.

0 Kudos
2,396 Views
yipingwang
NXP TechSupport
NXP TechSupport

We have analyzed your log file. It frequently showed a completion with CA status has been detected while reading the configuration space of the endpoint through PEX_CONFIG_ADDR/PEX_CONFIG_DATA. This condition happens when the completer receives a request that cannot be completed due to a violation of the completer's programming model.

 

We need some more information from the customer to analyze the issue further.

1) Who is the link partner in the customer's setup?

2) Please provide the PCIe dump before the crash.

3) As per the customer's earlier response, they observed the machine check. However, we do not see the machine check in the log. Please share the complete log.

4) What tests were running on the setup? At which point did the error occur?

5) To take the PCIe register dump after the crash, try disabling the PCI Express CA completion in the PCI Express error disable register.

0 Kudos
2,384 Views
hemwant
Contributor IV

Please find inline reply for your queries-

1) Who is the link partner in the customer's setup?

Link partner is End point device from microsemi.

2) Please provide the PCIe dump before the crash.

We can't take pcie dump before crash as we donot know when the dump comes.If possible we can take pcie dump just after booting the processor.

3) As per the customer's earlier response, they observed the machine check. However, we do not see the machine check in the log. Please share the complete log.

no machine check was observed in this case, only pcie dump was there and after some time processor goes into hung state.

4) What tests were running on the setup? At which point did the error occur?

We were configuring our end point device, the error comes randomly while accessing the device.

5) To take the PCIe register dump after the crash, try disabling the PCI Express CA completion in the PCI Express error disable register.

This settings needs to be disabled in uboot, or dynamically we have to modify kernel to disable CA completion error when PCIE register dump is detected.

0 Kudos
2,352 Views
yipingwang
NXP TechSupport
NXP TechSupport

2) We can't take pcie dump before crash as we donot know when the dump comes.If possible we can take pcie dump just after booting the processor.
[NXP] Take a dump just after loading the kernel. We would like to see the PCIe configurations when things are working.

5) This settings needs to be disabled in uboot, or dynamically we have to modify kernel to disable CA completion error when PCIE register dump is detected.
[NXP] It does not matter if you do it in U-Boot or in the kernel. Make sure you modify it before doing your test on end-point.

0 Kudos
2,149 Views
hemwant
Contributor IV

@yipingwangAfter debugging and live session with the framer device vendor, it was found out that SDK consists of  invalid read/write initiated on PCIE registers which were not supported by vendor. It was coding bug in vendor SDK.

 

Thanks for valuable support