Hi,
I meet Async Serror when access a PCie BAR space. very strange. Can you please help to figer out the reason? Thanks.
~# dmesg | grep pcie
[ 0.000000] Kernel command line: console=ttyAMA0,115200 earlycon=pl011,mmio32,0x21c0000 default_hugepagesz=1024m hugepagesz=1024m hugepages=2 pci=pcie_bus_perf root=PARTUUID=30303030-01 rw rootwait
[ 2.961922] layerscape-pcie 3600000.pcie: host bridge /soc/pcie@3600000 ranges:
[ 2.969235] layerscape-pcie 3600000.pcie: MEM 0x9400000000..0x97ffffffff -> 0xa400000000
[ 2.977495] layerscape-pcie 3600000.pcie: MEM 0x9040000000..0x90ffffffff -> 0x40000000
[ 2.985578] layerscape-pcie 3600000.pcie: IO 0x9010000000..0x901000ffff -> 0x00000000
[ 2.993740] layerscape-pcie 3600000.pcie: PCI host bridge to bus 0000:00
[ 3.065854] layerscape-pcie 3800000.pcie: host bridge /soc/pcie@3800000 ranges:
[ 3.073165] layerscape-pcie 3800000.pcie: MEM 0xa400000000..0xa7ffffffff -> 0xa400000000
[ 3.081426] layerscape-pcie 3800000.pcie: MEM 0xa040000000..0xa0ffffffff -> 0x40000000
[ 3.089508] layerscape-pcie 3800000.pcie: IO 0xa010000000..0xa01000ffff -> 0x00000000
[ 3.097663] layerscape-pcie 3800000.pcie: PCI host bridge to bus 0001:00
[ 6.588138] pcieport 0000:00:00.0: Adding to iommu group 1
[ 6.593728] pcieport 0000:00:00.0: PME: Signaling with IRQ 25
[ 6.599604] pcieport 0000:00:00.0: AER: enabled with IRQ 25
[ 6.605262] pcieport 0001:00:00.0: Adding to iommu group 2
[ 6.610845] pcieport 0001:00:00.0: PME: Signaling with IRQ 26
[ 6.617072] pcieport 0001:00:00.0: AER: enabled with IRQ 26
[ 6.622732] pcieport 0001:01:00.0: Adding to iommu group 2
[ 6.635190] pcieport 0001:02:00.0: Adding to iommu group 2
[ 6.647483] pcieport 0001:02:01.0: Adding to iommu group 2
[ 6.668651] pcieport 0001:02:02.0: Adding to iommu group 2
[ 6.682252] pcieport 0001:02:03.0: Adding to iommu group 2
[ 6.701919] pcieport 0001:02:04.0: Adding to iommu group 2
[ 6.715881] pcieport 0001:02:05.0: Adding to iommu group 2
~# lspci
0000:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 8d80 (rev 20)
0001:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 8d80 (rev 20)
0001:01:00.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:01:00.1 Memory controller: PMC-Sierra Inc. Device 8532
0001:02:00.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:01.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:02.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:03.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:04.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:02:05.0 PCI bridge: PMC-Sierra Inc. Device 8532
0001:05:00.0 Non-VGA unclassified device: Cisco Systems Inc Device 026e
0001:07:00.0 Ethernet controller: Marvell Technology Group Ltd. Device c819
~# lspci -s 07:00.0 -vvv
0001:07:00.0 Ethernet controller: Marvell Technology Group Ltd. Device c819
Subsystem: Marvell Technology Group Ltd. Device 11ab
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 0
Region 0: Memory at a404800000 (64-bit, prefetchable) [size=1M]
Region 2: Memory at a400000000 (64-bit, prefetchable) [size=64M]
Region 4: Memory at a404000000 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [60] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <256ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #7, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <256ns, L1 unlimited
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Segmentation fault
~# busybox devmem 0xa400000050
[ 88.226197] pcieport 0001:00:00.0: AER: Uncorrected (Non-Fatal) error received: 0001:00:00.0
[ 88.234641] SError Interrupt on CPU14, code 0xbf000002 -- SError
[ 88.234642] CPU: 14 PID: 732 Comm: busybox Tainted: GF O 5.4.47 #4
[ 88.234643] Hardware name: SolidRun LX2160A Twins (DT)
[ 88.234644] pstate: 60000085 (nZCv daIf -PAN -UAO)
[ 88.234644] pc : el0_irq_naked+0x4/0x54
[ 88.234645] lr : 0x40d8b0
[ 88.234646] sp : ffff800010afbec0
[ 88.234646] x29: ffff800010afbff0 x28: ffff0022e6d03140
[ 88.234648] x27: 0000000000000000 x26: 0000000000000000
[ 88.234650] x25: 0000000000000000 x24: 0000000000000000
[ 88.234651] x23: 0000000060000000 x22: 0000ffff83b24fac
[ 88.234652] x21: 00000000ffffffff x20: ffff5e7ed3ca6000
[ 88.234654] x19: 0000000000000000 x18: 0000000000000000
[ 88.234655] x17: 0000000000000000 x16: 0000000000000000
[ 88.234657] x15: 0000000000000000 x14: 0000000000000000
[ 88.234658] x13: 0000000000000000 x12: 0000000000000000
[ 88.234659] x11: 0000000000000000 x10: 0000000000000000
[ 88.234661] x9 : 0000000000000000 x8 : 0000000000000000
[ 88.234662] x7 : 0000000000000000 x6 : 0000000000000000
[ 88.234664] x5 : 0000000000000000 x4 : 0000000000000000
[ 88.234665] x3 : 0000000000000000 x2 : 0000000000000000
[ 88.234666] x1 : 0000000000000000 x0 : 0000000000000000
[ 88.234668] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 88.234669] CPU: 14 PID: 732 Comm: busybox Tainted: GF O 5.4.47 #4
[ 88.234670] Hardware name: SolidRun LX2160A Twins (DT)
[ 88.234671] Call trace:
[ 88.234671] dump_backtrace+0x0/0x150
[ 88.234672] show_stack+0x14/0x20
[ 88.234672] dump_stack+0xbc/0x100
[ 88.234673] panic+0x16c/0x37c
[ 88.234674] __stack_chk_fail+0x0/0x18
[ 88.234674] arm64_serror_panic+0x74/0x88
[ 88.234675] do_serror+0x70/0x138
[ 88.234675] el1_error+0x84/0xf8
[ 88.234676] el0_irq_naked+0x4/0x54
[ 88.234677] SMP: stopping secondary CPUs
[ 88.234678] Kernel Offset: 0x21a418a00000 from 0xffff800010000000
[ 88.234678] PHYS_OFFSET: 0xffff881e80000000
[ 88.234679] CPU features: 0x0002,21806008
[ 88.234679] Memory Limit: none
Solved! Go to Solution.
Thank you! finally we find out that it is due to that 2 devices are assigned with same BAR adddr. After enlarging PCIe cfg space, the issue gone.
Is it just reproduced on customized board with your own driver? Can't find evidence only from the logs.
i use lx2160a-cex7 from solid-run, and install some pcie devices on it.
I discussed this issue with the AE team.
They consider the AER error is caused by the hardware problem.
Thank you! finally we find out that it is due to that 2 devices are assigned with same BAR adddr. After enlarging PCIe cfg space, the issue gone.