Hello,
We have a simple PCI driver that is doing pci_iomap and then ioread32.
It works fine in LSDK-20.12 but we get the following panic on LSDK-21.08 (the panic occurs when the driver is calling ioread32).
Any idea how to debug this? what can cause this?
[ 296.208513] xdma:xdma_mod_init: Xilinx XDMA Reference Driver xdma v2020.1.8
[ 296.215515] xdma:xdma_mod_init: desc_blen_max: 0xfffffff/268435455, timeout: h2c 10 c2h 10 sec.
[ 296.224377] xdma:xdma_device_open: xdma device 0001:01:00.0, 0x000000003e674e35.
[ 296.231947] xdma:map_single_bar: BAR0 at 0x9400000000 mapped at 0x00000000540a24ba, length=65536(/65536)
[ 296.241559] SError Interrupt on CPU12, code 0xbf000002 – SError
[ 296.241560] CPU: 12 PID: 756 Comm: modprobe Not tainted 5.10.35 #15
[ 296.241561] Hardware name: SolidRun LX2160A Clearfog CX (DT)
[ 296.241562] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=–)
[ 296.241563] pc : xdma_device_open+0x890/0xca0 [xdma]
[ 296.241564] lr : xdma_device_open+0x408/0xca0 [xdma]
[ 296.241565] sp : ffff8000113537c0
[ 296.241566] x29: ffff8000113537c0 x28: 0000009400000000
[ 296.241569] x27: ffffdc9f4c87a000 x26: 0000000000000000
[ 296.241571] x25: ffffdc9f0c38a000 x24: ffff0a920205a3b0
[ 296.241573] x23: 0000000000000000 x22: 0000000000010000
[ 296.241576] x21: ffff0a920205a000 x20: ffff0a920ba49048
[ 296.241578] x19: ffff0a920ba49000 x18: ffffffffffffffff
[ 296.241580] x17: 00000000000000c0 x16: ffffdc9f4b46422c
[ 296.241582] x15: ffffdc9f4c87a188 x14: 0000000000000325
[ 296.241584] x13: ffff800011353490 x12: 00000000ffffffea
[ 296.241586] x11: ffffdc9f4c90db50 x10: ffffdc9f4c8f5b10
[ 296.241589] x9 : ffffdc9f4c8f5b68 x8 : 0000000000017fe8
[ 296.241591] x7 : c0000000ffffefff x6 : ffff0a94fe35d860
[ 296.241593] x5 : ffff0a94fe35d860 x4 : 0000000000000000
[ 296.241595] x3 : ffff0a94fe36d930 x2 : dadd71bdec942000
[ 296.241597] x1 : 0000000000000000 x0 : ffff800012c70000
[ 296.241599] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 296.241600] CPU: 12 PID: 756 Comm: modprobe Not tainted 5.10.35 #15
[ 296.241601] Hardware name: SolidRun LX2160A Clearfog CX (DT)
[ 296.241602] Call trace:
[ 296.241603] dump_backtrace+0x0/0x1e8
[ 296.241604] show_stack+0x18/0x28
[ 296.241605] dump_stack+0xd8/0x134
[ 296.241606] panic+0x180/0x3ac
[ 296.241607] add_taint+0x0/0xb0
[ 296.241607] arm64_serror_panic+0x78/0x88
[ 296.241608] do_serror+0x38/0x98
[ 296.241609] el1_error+0x84/0x104
[ 296.241610] xdma_device_open+0x890/0xca0 [xdma]
[ 296.241611] probe_one+0x90/0x290 [xdma]
[ 296.241612] local_pci_probe+0x40/0xb0
[ 296.241613] pci_device_probe+0x130/0x1c8
[ 296.241614] really_probe+0x2ac/0x500
[ 296.241615] driver_probe_device+0xfc/0x168
[ 296.241616] device_driver_attach+0x74/0x80
[ 296.241617] __driver_attach+0xb8/0x168
[ 296.241617] bus_for_each_dev+0x7c/0xd8
[ 296.241618] driver_attach+0x24/0x30
[ 296.241619] bus_add_driver+0x184/0x250
[ 296.241620] driver_register+0x64/0x120
[ 296.241621] __pci_register_driver+0x44/0x50
[ 296.241622] xdma_mod_init+0x98/0xa8 [xdma]
# lspci
0000:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 8d80 (rev 20)
0000:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 980
0001:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 8d80 (rev 20)
0001:01:00.0 Serial controller: Xilinx Corporation Device 8038 <====== our device
0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 8d80 (rev 20)
#lspci -v (just our device):
0001:01:00.0 Serial controller: Xilinx Corporation Device 8038 (prog-if 01 [16450])
Subsystem: Xilinx Corporation Device 0007
Flags: fast devsel, IRQ 121, IOMMU group 1
Memory at 9400000000 (64-bit, prefetchable) [size=64K]
Memory at 9400010000 (64-bit, prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [1c0] Secondary PCI Express
Kernel modules: xdma
We are using SERDES 14_2_2:
Model: SolidRun LX2160ACEX7 COM express type 7 based board
Board: LX2160ACE Rev2.0-CEX7, SD
SERDES1 Reference: Clock1 = 161.13MHz Clock2 = 100MHz
SERDES2 Reference: Clock1 = 100MHz Clock2 = 100MHz
SERDES3 Reference: Clock1 = 100MHz Clock2 = 100Hz
DRAM: 15.9 GiB
DDR 15.9 GiB (DDR4, 64-bit, CL=22, ECC on)
dev_get_priv: null device
dev_get_priv: null device
Using SERDES1 Protocol: 14 (0xe)
Using SERDES2 Protocol: 2 (0x2)
Using SERDES3 Protocol: 2 (0x2)
PCIe1: pcie@3400000 disabled
PCIe2: pcie@3500000 Root Complex: x4 gen3
PCIe3: pcie@3600000 Root Complex: x8 gen3
PCIe4: pcie@3700000 disabled
PCIe5: pcie@3800000 Root Complex: no link
PCIe6: pcie@3900000 disabled
#lspci -vvv
0001:01:00.0 Serial controller: Xilinx Corporation Device 8038 (prog-if 01 [16450])
Subsystem: Xilinx Corporation Device 0007
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 121
IOMMU group: 1
Region 0: Memory at 9400000000 (64-bit, prefetchable) [size=64K]
Region 2: Memory at 9400010000 (64-bit, prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 256 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [1c0 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Kernel modules: xdma