P1022: Console hangs after PCIE error dump

cancel
Showing results for 
Search instead for 
Did you mean: 

P1022: Console hangs after PCIE error dump

647 Views
hemwant
Contributor IV

We are having P1022 based customized board, the processor gets struck after throwing pci error dump and nothing is accessible via console or telnet session . The log of the dump is as follows-

 

Oct 29 17:11:25 p1022ds local5.info TPN[2206]: PmHyPhyGetLineFecCounters: TPN-PM:: ADDING index 11 intf_id 1
Oct 29 17:11:25 p1022ds local5.info TPN[2206]: PmHyPhyGetLineFecCounters: TPN-PM :: fec corr+9*uncorr[1][11] : 4
Oct 29 17:11:25 p1022ds local5.info TPN[2206]: PmHyPhyMapOtnIntfIdToSlice: PM: intf type 27 intf id 2
Oct 29 17:11:25 p1022ds local5.info TPN[2206]: PmHyPhyMapOtnIntfIdToSlice: PM:otn line 1-7 9
Oct 29 17:11:25 p1022ds local5.info TPN[2206]: PmHyPhyGetLineOtnParams: TPN-PM :: **SAMPLE PM-BIP errors 1451, intf id :2
Oct 29 17:11:25 p1022ds local5.info TPN[2206]: PmHyPhyGetLineOtnParams: TPN-PM :: **SAMPLE PM-BEI errors 0, intf id :2
Oct 29 17:11:25 p1022ds local5.crit TPN[2206]: CalcOtnErrorParamsGeneric: TPN-PM :: odu_pm_es[684] : 1
Oct 29 17:11:25 p1022ds local5.crit TPN[2206]: CalcOtnErrorParamsGeneric: TPN-PM :: odu_pm_ses[684] : 0
Oct 29 17:11:25 p1022ds local5.crit TPN[2206]: CalcOtnErrorParamsGeneric: TPN-PM :: odu_pm_bbe[684] : 1451
Oct 29 17:11:25 p1022ds local5.crit TPN[2206]: CalcOtnErrorParamsGeneric: TPN-PM :: odu_pm_fees[684] : 0
Oct 29 17:11:25 p1022ds local5.crit TPN[2206]: CalcOtnErrorParamsGeneric: TPN-PM :: odu_pm_feses[684] : 0
Oct 29 17:11:25 p1022ds local5.crit TPN[2206]: CalcOtnErrorParamsGeneric: TPN-PM :: odu_pm_bbe[684] : 0
Oct 29 17:11:25 p1022ds local5.crit TPN[2206]: CalcOtnErrorParamsGeneric: odu_pm_bip_errors: 1451
Oct 29 17:11:25 p1022ds local5.info TPN[2206]: PmHyPhyGetLineOtnParams: TPN-PM :: **SAMPLE SM-BIP errors 1732, intf id :2
Oct 29 17:11:25 p1022ds local5.info TPN[2206]: PmHyPhyGetLineOtnParams: TPN-PM :: **SAMPLE SM-BEI errors 1733, intf id :2
Oct 29 17:11:25 p1022ds local5.err TPN[2206]: PmHyPhyGetLineFecCounters: PM:no dev init / wrong intf id 2
Card type value 1 4The AlarmStaus is 1 of bit 12The AlarmStaus is 1 of bit 14The AlarmStaus is 1 of bit 8The AlarmStaus is 1 of bit 10The AlarmStaus is 0 of bit 15The AlarmStaus is 1 of bit 11The AlarmStaus is 0 of bit 14The AlarmStaus is 1 of bit 7The AlarmStaus is 1 of bit 8The AlarmStaus is 0 of bit 9The PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0100
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0100
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0100
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0100
PCIe error(s) detected
PCIe ERR_DR register: 0x80800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
pcieport 0000:00:00.0: AER: Multiple Uncorrected (Non-Fatal) error received: id=0100
pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
pcieport 0000:00:00.0: device [1957:0110] error status/mask=00004000/00000000
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
PCIe error(s) detected
PCIe ERR_DR register: 0x00800000
PCIe ERR_CAP_STAT register: 0x00000041
PCIe ERR_CAP_R0 register: 0x00000800
PCIe ERR_CAP_R1 register: 0x00000000
PCIe ERR_CAP_R2 register: 0x00000000
PCIe ERR_CAP_R3 register: 0x00000000
pcieport 0000:00:00.0: [14] Completion

 

0 Kudos
3 Replies

638 Views
yipingwang
NXP TechSupport
NXP TechSupport

PEXx_PEX_ERR_DR[PCT] is set

PCI Express completion time-out. A completion time-out condition was detected for a non-posted,outbound PCI Express transaction. An error response is sent back to the requestor. Note that a completion timeout counter only starts when the non-posted request was able to send to the link partner.
1 A completion time-out on the PCI Express link was detected. Note that a completion timeout error is a fatal error. If a completion timeout error is detected, the system has become unstable. Hot reset is recommended to restore stability of the system.

In general, completion timeout is caused by downstream EP device fails to send back the expected data for an outbound MRd. This normally happens when it's totally dead, since if it's just the case of an outbound MRd hitting an non-configured memory region of an EP, the EP should respond back with UR instead of silent

0 Kudos

400 Views
hemwant
Contributor IV

@yipingwang 

Can we avoid the CPU to go into inconsistent state in case of PCI Express completion time-out.

As per our understanding The CPU has to report the PCIe error and continue its process.

We need the Host CPU in running state to further debug and find out the root cause of the problem at framer device.

0 Kudos

195 Views
hemwant
Contributor IV

@yipingwangplease respond to the queries raised in previous thread.

 

Can we avoid the CPU to go into inconsistent state in case of PCI Express completion time-out.

As per our understanding The CPU has to report the PCIe error and continue its process.

We need the Host CPU in running state to further debug and find out the root cause of the problem at framer device.

0 Kudos