Dmesg output is below. AER irq 482 failure. Is there a fix? Does AER work on P4080's?
Jerry
[ 2.136605] irq 482: nobody cared (try booting with the "irqpoll" option)
[ 2.136611] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-24-powerpc-e5006
[ 2.136614] Call Trace:
[ 2.136624] [eed0bcd0] [c000806c] show_stack+0xfc/0x1c0 (unreliable)
[ 2.136632] [eed0bd20] [c08046d8] dump_stack+0x78/0xa0
[ 2.136639] [eed0bd30] [c00ac960] __report_bad_irq+0x40/0x100
[ 2.136644] [eed0bd50] [c00acf68] note_interrupt+0x238/0x290
[ 2.136649] [eed0bd80] [c00aa104] handle_irq_event_percpu+0x154/0x270
[ 2.136654] [eed0bdd0] [c00aa26c] handle_irq_event+0x4c/0x80
[ 2.136659] [eed0bde0] [c00ad70c] handle_level_irq+0xcc/0x160
[ 2.136664] [eed0bdf0] [c00a9488] generic_handle_irq+0x48/0x70
[ 2.136671] [eed0be00] [c0024d5c] fsl_error_int_handler+0xec/0x100
[ 2.136675] [eed0be20] [c00aa028] handle_irq_event_percpu+0x78/0x270
[ 2.136680] [eed0be70] [c00aa26c] handle_irq_event+0x4c/0x80
[ 2.136685] [eed0be80] [c00ae11c] handle_fasteoi_irq+0xdc/0x1a0
[ 2.136690] [eed0be90] [c00052fc] __do_irq+0x5c/0x150
[ 2.136695] [eed0bea0] [c00054e0] do_IRQ+0xf0/0x110
[ 2.136701] [eed0bec0] [c0010bbc] ret_from_except+0x0/0x18
[ 2.136710] --- Exception: 501 at __do_softirq+0xc4/0x2b0
[ 2.136710] LR = __do_softirq+0x24/0x2b0
[ 2.136716] [eed0bfe0] [c0055d44] irq_exit+0xb4/0xf0
[ 2.136720] [eed0bff0] [c000e890] call_do_irq+0x24/0x3c
[ 2.136725] [c0b7fe90] [c0005488] do_IRQ+0x98/0x110
[ 2.136730] [c0b7feb0] [c0010bbc] ret_from_except+0x0/0x18
[ 2.136736] --- Exception: 501 at arch_cpu_idle+0x30/0x80
[ 2.136736] LR = arch_cpu_idle+0x30/0x80
[ 2.136743] [c0b7ff70] [c00b5b38] rcu_idle_enter+0xb8/0x100 (unreliable)
[ 2.136748] [c0b7ff80] [c00a9300] cpu_startup_entry+0x160/0x260
[ 2.136756] [c0b7ffc0] [c0a927ec] start_kernel+0x33c/0x350
[ 2.136761] [c0b7fff0] [c00003fc] skpinv+0x2e8/0x324
[ 2.136763] handlers:
[ 2.136769] [<c03ea8a0>] aer_irq
[ 2.136773] [<c03eba20>] pcie_pme_irq
[ 2.136775] Disabling IRQ #482
On some legacy platforms with legacy PCI conroller(e.g. some non-DPAA platforms),
hardware doesn't support Fatal error type for AER, just support Non-Fatal error.
Generally, DPAA platforms with new PCIE controller can support both Fatal error and
Non-Fatal error.
Yiping:
We are running " a rev3 P4080 SoC with rev3 e500mc cores" .
It is definitely DPAA and definitely NOT legacy.
Jerry
Do you use the SDK 1.5?
Please check the Kernel configuration
Bus options --->
[*] PCI Express support
[*] Root Port Advanced Error Reporting
support
<*> PCIe AER error injector support
And the following test steps.
2.1In the uboot prompt: Adding pcie_ports=native to bootargs
=>setenv othbootargs pcie_ports=native
2.2Reboot the board with the kernel
# zcat /proc/config.gz|grep -i CONFIG_PCIEAER_INJECT
# cat /proc/cmdline
root=/dev/ram rw console=ttyS0,115200 pcie_ports=native
2.3Check wheather the inject device node is created.
# ls /dev/aer_inject
The test device node /dev/aer_inkect exists.
3. Download the aer_inject test program from "http://www.kernel.org/pub/linux/utils/pci/aer-inject/".
Cross compile it on the PC(server):
$ tar -xf aer-inject-0.1.tar.gz
$ cd aer-inject-0.1
$ source /opt/fsl/1.1/environment-setup-ppce500mc-fsl-linux
$ make
A binary file named "aer-inject" is created in current folder.
4. On the board:
Geting the pcie bus device and funciont number, and this step is a prepration for the next.
# lspci -vvv
01:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connection (rev 01) Capabilities: [100] Advanced Error Reporting
Here "01:00.1
" means BUS 1; device 0;function 1
5.
Write the test config file
In the aer-inject folder.
$ mkdir test
$ cd test
$ cat ear1
AER
BUS 1 DEV 0 FN 0
UNCOR_STATUS {ERROR_NUM}
HEADER_LOG 0 1 2 3
Note: {ERROR_NUM} should be one of
TRAIN,DLP,POISON_TLP,FCP,COMP_TIME,COMP_ABORT,UNX_COMP,RX_OVER,MALF_TLP,ECRC,UNS
6.Transfer the file aer-inject, aer1,aer2 and aer3 to the board
7.
root@p5020ds:~# ./aer-inject aer3
pcieport 0000:00:00.0: AER: Uncorrected (Fatal) error received: id=0100
e1000e 0000:01:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Unaccessible, id=0100(Unregistered Agent ID)
e1000e 0000:01:00.0: broadcast error_detected message
root@p5020ds:~# pcieport 0000:00:00.0: Root Port link has been reset
e1000e 0000:01:00.0: broadcast slot_reset message
e1000e 0000:01:00.0: Disabling ASPM L1
e1000e 0000:01:00.0: enabling device (0000 -> 0002)
e1000e 0000:01:00.0: restoring config space at offset 0x6 (was 0x1, writing 0x1001)
e1000e 0000:01:00.0: restoring config space at offset 0x5 (was 0x0, writing 0xe0020000)
e1000e 0000:01:00.0: restoring config space at offset 0x4 (was 0x0, writing 0xe0000000)
e1000e 0000:01:00.0: restoring config space at offset 0x3 (was 0x10, writing 0x8)
e1000e 0000:01:00.0: broadcast resume message
e1000e 0000:01:00.0: AER driver successfully recovered
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
8.
Double check whether the pcie device can still work. Take the PCIE-NIC for example.
# ifocnfig eth0 192.168.1.2
# ping 192.168.1.1
Have a great day,
Yiping Wang
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------