AnsweredAssumed Answered

NETDEV WATCHDOG: transmit queue timed out

Question asked by Arvind Prasanna on Sep 27, 2017
Latest reply on Oct 19, 2017 by Arvind Prasanna

I have a NXP-LS2088ardb running kernel 4.1.35-rt41 compiled through the yocto project. I plugged in an Intel 82580 4x1GE card and loaded it's kernel module igb and I can see all the four interfaces in ifconfig. In this case, these four interfaces happen to be enP2p1s0f0, enP2p1s0f1, enP2p1s0f2 and enP2p1s0f3.

 

All the four interfaces can be brought up and there are no errors:

# ifconfig enP2p1s0f0 up
# ifconfig enP2p1s0f1 up
# ifconfig enP2p1s0f2 up
# ifconfig enP2p1s0f3 up

 

/proc/interrupts information:

216: 0 0 0 0 0 0 0 0 ITS-MSI 268959744 Edge enP2p1s0f0
217: 499 0 0 0 0 0 0 0 ITS-MSI 268959745 Edge enP2p1s0f0-TxRx-0
218: 0 0 0 0 0 0 0 0 ITS-MSI 268961792 Edge enP2p1s0f1
219: 499 0 0 0 0 0 0 0 ITS-MSI 268961793 Edge enP2p1s0f1-TxRx-0
220: 0 0 0 0 0 0 0 0 ITS-MSI 268963840 Edge enP2p1s0f2
221: 498 0 0 0 0 0 0 0 ITS-MSI 268963841 Edge enP2p1s0f2-TxRx-0
222: 0 0 0 0 0 0 0 0 ITS-MSI 268965888 Edge enP2p1s0f3
223: 0 0 0 0 0 0 0 0 ITS-MSI 268965889 Edge enP2p1s0f3-TxRx-0

 

MSI-X is enabled but the fourth interface enP2p1s0f3 does not see any interrupt but the first three do.

 

If an ethernet cable is plugged in and a dhclient is performed on enP2p1s0f3, there is a watchdog timeout trace in

dmesg:

[ 164.468846] NETDEV WATCHDOG: enP2p1s0f3 (igb): transmit queue 0 timed out
[ 164.468848] Modules linked in: igb(O)
[ 164.468856] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.1.35-rt41 #1
[ 164.468858] Hardware name: Freescale Layerscape 2088a RDB Board (DT)
[ 164.468861] Call trace:
[ 164.471301] [<ffff8000000898e4>] dump_backtrace+0x0/0x11c
[ 164.471306] [<ffff800000089a14>] show_stack+0x14/0x1c
[ 164.471310] [<ffff80000084b92c>] dump_stack+0x90/0xb0
[ 164.471314] [<ffff8000000b1160>] warn_slowpath_common+0x98/0xd0
[ 164.471317] [<ffff8000000b11e8>] warn_slowpath_fmt+0x50/0x58
[ 164.471320] [<ffff80000072323c>] dev_watchdog+0x268/0x274
[ 164.471324] [<ffff8000000fd40c>] call_timer_fn.isra.29+0x28/0x94
[ 164.471327] [<ffff8000000fd638>] run_timer_softirq+0x1c0/0x230
[ 164.471331] [<ffff8000000b4d68>] __do_softirq+0x108/0x244
[ 164.471334] [<ffff8000000b5550>] irq_exit+0x90/0xf4
[ 164.471339] [<ffff8000000ee3e0>] __handle_domain_irq+0x60/0xb0
[ 164.471342] [<ffff8000000824fc>] gic_handle_irq+0x84/0xe4
[ 164.471344] Exception stack(0xffff800000c33d70 to 0xffff800000c33ea0)
[ 164.471348] 3d60: 322458f8 00000026 00000000 00010000
[ 164.471351] 3d80: 00c33ed0 ffff8000 0062d0a8 ffff8000 60000145 00000000 00000000 00000000
[ 164.471355] 3da0: 322458f8 00000026 28000000 0004768f 00049984 00000000 28000000 00000000
[ 164.471358] 3dc0: 0000198c 00000000 00000018 00000000 88000000 0003be92 cccccccd cccccccc
[ 164.471362] 3de0: 00000000 00000000 00084800 ffff8000 00001000 00000000 00000000 00000000
[ 164.471365] 3e00: 34d5d91d 00000000 0cf0f740 ffff8083 f16b89c0 ffff8082 00000000 00008000
[ 164.471368] 3e20: 0f001168 ffff8083 00000022 00000000 cc3917e0 ffff7c01 322458f8 00000026
[ 164.471371] 3e40: 7a9c9600 ffff8000 00000000 00000000 00000000 00000000 318fcb98 00000026
[ 164.471374] 3e60: 00c987c0 ffff8000 00c18648 ffff8000 00868000 ffff8000 00c30000 ffff8000
[ 164.471378] 3e80: 00c37000 ffff8000 00c33ed0 ffff8000 0062d0a0 ffff8000 00c33ed0 ffff8000
[ 164.471381] [<ffff800000085700>] el1_irq+0x80/0x100
[ 164.471386] [<ffff80000062d1ac>] cpuidle_enter+0x18/0x20
[ 164.471389] [<ffff8000000e5380>] cpu_startup_entry+0x184/0x264
[ 164.471392] [<ffff800000848fa0>] rest_init+0x74/0x7c
[ 164.471397] [<ffff800000ba194c>] start_kernel+0x37c/0x390
[ 164.471399] ---[ end trace 924bff9a4e56157a ]---
[ 170.459306] igb 0002:01:00.3 enP2p1s0f3: igb: enP2p1s0f3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX


dhclient works as expected on the first three ports.

 

lspci output of the failing interface:

# lspci -s 0002:01:00.3 -vvv
0002:01:00.3 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
Subsystem: Intel Corporation Ethernet Server Adapter I340-T4
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin D routed to IRQ 209
Region 0: Memory at 3046200000 (32-bit, non-prefetchable) [size=512K]
Region 3: Memory at 304628c000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #4, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <8us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Device Serial Number (hidden for privacy)
Capabilities: [1a0 v1] Transaction Processing Hints
Device specific mode supported
Steering table in TPH capability structure
Kernel driver in use: igb


This is the lspci tree:

# lspci -t
-+-[0003:00]---00.0-[01]----00.0
+-[0002:00]---00.0-[01]--+-00.0
| +-00.1
| +-00.2
| \-00.3
+-[0001:00]---00.0-[01]--
\-[0000:00]---00.0-[01]--


Has anybody faced a similar situation before? I would appreciate help in this matter!

 

Thanks,

Arvind.

Outcomes