We have a custom board that uses the LS1046A connected to a Switchtek PM40036 PCIe switch to allow connecting 20 PCIe devices. As we needed to use features that are only available in the version 6 kernel, we have upgraded to kernel 6.1.
This has exposed an issue that I am struggling to understand. We get an error during boot that does a stack_dump(), and after some searching, it was found this was caused by enabling legacy_interrupts. Since 5.13 (https://lore.kernel.org/linux-pci/5b3f3544-b7a5-c338-d53a-c6d7ff3ac8e0@nvidia.com/T/) the Arm64 architecture no longer supports this feature. In our case, this error has the side effect of rendering inoperable the first PCIe port processed (the one that generated this error.)
While I understand what is happening, I don't under why it is this way. The documentation clearly shows the LS1046A supports MSI interrupts, but the Linux drivers don't seem to be using them.
Is there a patch I need to apply to the mainline Linux kernel to make this work?
When I look at /proc/interrupts, I see the following:
root@localhost:~# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
9: 0 0 0 0 GICv2 25 Level vgic
11: 267302 353977 340076 422853 GICv2 30 Level arch_timer
12: 0 0 0 0 GICv2 27 Level kvm guest vtimer
14: 0 0 0 0 GICv2 98 Level gpio-cascade
15: 0 0 0 0 GICv2 99 Level gpio-cascade
16: 3 0 0 0 GICv2 100 Level gpio-cascade
17: 0 0 0 0 GICv2 166 Level gpio-cascade
18: 32023 0 0 0 GICv2 88 Level 2180000.i2c
19: 31997 0 0 0 GICv2 89 Level 2190000.i2c
20: 76999 0 0 0 GICv2 90 Level 21a0000.i2c
21: 259 0 0 0 GICv2 91 Level 21b0000.i2c
22: 28997 0 0 0 GICv2 75 Level fsl-ifc
23: 0 0 0 0 GICv2 138 Level arm-pmu
24: 0 0 0 0 GICv2 139 Level arm-pmu
25: 0 0 0 0 GICv2 127 Level arm-pmu
26: 0 0 0 0 GICv2 129 Level arm-pmu
27: 32960 0 0 0 GICv2 86 Level ttyS0
28: 5 0 0 0 GICv2 131 Level 1550000.spi
29: 3 0 0 0 mpc8xxx-gpio 15 Edge gt911
30: 3915558 0 0 0 GICv2 96 Level 2100000.spi
31: 0 0 0 0 GICv2 174 Level arm-smmu global fault, arm-smmu-context-fault, arm-smmu-context-fault, arm-smmu-context-fault, arm-smmu-context-fault, arm-smmu-context-fault
32: 0 0 0 0 GICv2 175 Level arm-smmu global fault
33: 0 0 0 0 GICv2 185 Level qDMA error
34: 0 0 0 0 GICv2 71 Level qDMA queue
35: 0 0 0 0 GICv2 92 Level xhci-hcd:usb1
36: 1011 0 0 0 GICv2 93 Level dwc3
38: 41544 0 0 0 GICv2 142 Level xhci-hcd:usb3, xhci-hcd:usb5, xhci-hcd:usb7, xhci-hcd:usb9, xhci-hcd:usb11, xhci-hcd:usb13, xhci-hcd:usb15, xhci-hcd:usb17, xhci-hcd:usb19, 5
IPI0: 3248 3790 3688 3986 Rescheduling interrupts
IPI1: 25151 98589 77808 55638 Function call interrupts
IPI2: 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 Timer broadcast interrupts
IPI5: 0 0 0 0 IRQ work interrupts
IPI6: 0 0 0 0 CPU wake-up interrupts
Err: 0
root@localhost:~#All the xhci-hcd controllers are actually PCIe devices. I expect to see them indicated as PCI-MSI, not assigned to GICv2 142. So I've reached the extent of my understanding of the problem. If the LS1046a supports MSI, why are these being configured as legacy interrupts?
Is there a special driver or a configuration that I need to include, that I am not?
How do I get the PCIe host controller to support MSI?
Any advice appreciated.
Suggest using the following kernel version instead of the mentioned mainline Linux kernel 6.1.
MSI works well in the lf-6.1.55-2.2.0, please refer to the attached logs for details.
https://github.com/nxp-qoriq/linux/tree/lf-6.1.55-2.2.0