LX2160A: Async SError when accessing 64bit PCIe BAR

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

LX2160A: Async SError when accessing 64bit PCIe BAR

Jump to solution
1,495 Views
seali
Contributor II

Hi,

I meet Async Serror when access a PCie BAR space. very strange. Can you please help to figer out the reason? Thanks.

 

~# dmesg | grep pcie
[ 0.000000] Kernel command line: console=ttyAMA0,115200 earlycon=pl011,mmio32,0x21c0000 default_hugepagesz=1024m hugepagesz=1024m hugepages=2 pci=pcie_bus_perf root=PARTUUID=30303030-01 rw rootwait
[ 2.961922] layerscape-pcie 3600000.pcie: host bridge /soc/pcie@3600000 ranges:
[ 2.969235] layerscape-pcie 3600000.pcie: MEM 0x9400000000..0x97ffffffff -> 0xa400000000
[ 2.977495] layerscape-pcie 3600000.pcie: MEM 0x9040000000..0x90ffffffff -> 0x40000000
[ 2.985578] layerscape-pcie 3600000.pcie: IO 0x9010000000..0x901000ffff -> 0x00000000
[ 2.993740] layerscape-pcie 3600000.pcie: PCI host bridge to bus 0000:00
[ 3.065854] layerscape-pcie 3800000.pcie: host bridge /soc/pcie@3800000 ranges:
[ 3.073165] layerscape-pcie 3800000.pcie: MEM 0xa400000000..0xa7ffffffff -> 0xa400000000
[ 3.081426] layerscape-pcie 3800000.pcie: MEM 0xa040000000..0xa0ffffffff -> 0x40000000
[ 3.089508] layerscape-pcie 3800000.pcie: IO 0xa010000000..0xa01000ffff -> 0x00000000
[ 3.097663] layerscape-pcie 3800000.pcie: PCI host bridge to bus 0001:00
[ 6.588138] pcieport 0000:00:00.0: Adding to iommu group 1
[ 6.593728] pcieport 0000:00:00.0: PME: Signaling with IRQ 25
[ 6.599604] pcieport 0000:00:00.0: AER: enabled with IRQ 25
[ 6.605262] pcieport 0001:00:00.0: Adding to iommu group 2
[ 6.610845] pcieport 0001:00:00.0: PME: Signaling with IRQ 26
[ 6.617072] pcieport 0001:00:00.0: AER: enabled with IRQ 26
[ 6.622732] pcieport 0001:01:00.0: Adding to iommu group 2
[ 6.635190] pcieport 0001:02:00.0: Adding to iommu group 2
[ 6.647483] pcieport 0001:02:01.0: Adding to iommu group 2
[ 6.668651] pcieport 0001:02:02.0: Adding to iommu group 2
[ 6.682252] pcieport 0001:02:03.0: Adding to iommu group 2
[ 6.701919] pcieport 0001:02:04.0: Adding to iommu group 2
[ 6.715881] pcieport 0001:02:05.0: Adding to iommu group 2
~# lspci
0000:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 8d80 (rev 20)
0001:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 8d80 (rev 20)
0001:01:00.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:01:00.1 Memory controller: PMC-Sierra Inc. Device 8532
0001:02:00.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:01.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:02.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:03.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:04.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:05.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:05:00.0 Non-VGA unclassified device: Cisco Systems Inc Device 026e
0001:07:00.0 Ethernet controller: Marvell Technology Group Ltd. Device c819
~# lspci -s 07:00.0 -vvv
0001:07:00.0 Ethernet controller: Marvell Technology Group Ltd. Device c819
Subsystem: Marvell Technology Group Ltd. Device 11ab
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 0
Region 0: Memory at a404800000 (64-bit, prefetchable) [size=1M]
Region 2: Memory at a400000000 (64-bit, prefetchable) [size=64M]
Region 4: Memory at a404000000 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [60] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <256ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #7, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <256ns, L1 unlimited
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Segmentation fault

~# busybox devmem 0xa400000050
[ 88.226197] pcieport 0001:00:00.0: AER: Uncorrected (Non-Fatal) error received: 0001:00:00.0
[ 88.234641] SError Interrupt on CPU14, code 0xbf000002 -- SError
[ 88.234642] CPU: 14 PID: 732 Comm: busybox Tainted: GF O 5.4.47 #4
[ 88.234643] Hardware name: SolidRun LX2160A Twins (DT)
[ 88.234644] pstate: 60000085 (nZCv daIf -PAN -UAO)
[ 88.234644] pc : el0_irq_naked+0x4/0x54
[ 88.234645] lr : 0x40d8b0
[ 88.234646] sp : ffff800010afbec0
[ 88.234646] x29: ffff800010afbff0 x28: ffff0022e6d03140
[ 88.234648] x27: 0000000000000000 x26: 0000000000000000
[ 88.234650] x25: 0000000000000000 x24: 0000000000000000
[ 88.234651] x23: 0000000060000000 x22: 0000ffff83b24fac
[ 88.234652] x21: 00000000ffffffff x20: ffff5e7ed3ca6000
[ 88.234654] x19: 0000000000000000 x18: 0000000000000000
[ 88.234655] x17: 0000000000000000 x16: 0000000000000000
[ 88.234657] x15: 0000000000000000 x14: 0000000000000000
[ 88.234658] x13: 0000000000000000 x12: 0000000000000000
[ 88.234659] x11: 0000000000000000 x10: 0000000000000000
[ 88.234661] x9 : 0000000000000000 x8 : 0000000000000000
[ 88.234662] x7 : 0000000000000000 x6 : 0000000000000000
[ 88.234664] x5 : 0000000000000000 x4 : 0000000000000000
[ 88.234665] x3 : 0000000000000000 x2 : 0000000000000000
[ 88.234666] x1 : 0000000000000000 x0 : 0000000000000000
[ 88.234668] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 88.234669] CPU: 14 PID: 732 Comm: busybox Tainted: GF O 5.4.47 #4
[ 88.234670] Hardware name: SolidRun LX2160A Twins (DT)
[ 88.234671] Call trace:
[ 88.234671] dump_backtrace+0x0/0x150
[ 88.234672] show_stack+0x14/0x20
[ 88.234672] dump_stack+0xbc/0x100
[ 88.234673] panic+0x16c/0x37c
[ 88.234674] __stack_chk_fail+0x0/0x18
[ 88.234674] arm64_serror_panic+0x74/0x88
[ 88.234675] do_serror+0x70/0x138
[ 88.234675] el1_error+0x84/0xf8
[ 88.234676] el0_irq_naked+0x4/0x54
[ 88.234677] SMP: stopping secondary CPUs
[ 88.234678] Kernel Offset: 0x21a418a00000 from 0xffff800010000000
[ 88.234678] PHYS_OFFSET: 0xffff881e80000000
[ 88.234679] CPU features: 0x0002,21806008
[ 88.234679] Memory Limit: none

 

0 Kudos
Reply
1 Solution
1,409 Views
seali
Contributor II

Thank you! finally we find out that it is due to that 2 devices are assigned with same BAR adddr. After enlarging PCIe cfg space, the issue gone.

 

View solution in original post

0 Kudos
Reply
4 Replies
1,476 Views
yipingwang
NXP TechSupport
NXP TechSupport

Is it just reproduced on customized board with your own driver? Can't find evidence only from the logs.

0 Kudos
Reply
1,441 Views
seali
Contributor II

i use lx2160a-cex7 from solid-run, and install some pcie devices on it.

 

0 Kudos
Reply
1,412 Views
yipingwang
NXP TechSupport
NXP TechSupport

I discussed this issue with the AE team.

They consider the AER error is caused by the hardware problem.

0 Kudos
Reply
1,410 Views
seali
Contributor II

Thank you! finally we find out that it is due to that 2 devices are assigned with same BAR adddr. After enlarging PCIe cfg space, the issue gone.

 

0 Kudos
Reply