Transmit queue timed out on eth0 causing netdev watchdog

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Transmit queue timed out on eth0 causing netdev watchdog

14,589 Views
charlesung
Contributor III

On our imx6q board running kernel 3.14.52-1.1.0_ga (commit 5f6f0a5), we have encountered a case that the transmit queue timing out on eth0. I did a search on the web and found a thread on the boundary forum that is very similar to this but it was more than 3 years ago and the patch was done on a much older kernel. So I just want to know is there something wrong in the fec driver that is causing this? If so, is there a patch available that can fix this problem?

 

 

------------[ cut here ]------------

WARNING: CPU: 0 PID: 0 at /yocto/iteris-2.0/build/tmp/work-shared/ccu6/kernel-source/net/sched/sch_generic.c:264 dev_watchdog+0x260/0x26c()

NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out

Modules linked in: loadfpga(O) virtual_fb(O) mxc_v4l2_capture ipu_bg_overlay_sdc ipu_still ipu_prp_enc ipu_csi_enc tvp5147(O) v4l2_int_device ipu_fg_overlay_sdc evbug mxc_dcic galcore(O)

CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.14.52-1.1.0_ga+g5f6f0a5 #1

[<800158e4>] (unwind_backtrace) from [<800123d4>] (show_stack+0x10/0x14)

[<800123d4>] (show_stack) from [<806eed4c>] (dump_stack+0x7c/0xbc)

[<806eed4c>] (dump_stack) from [<8002f660>] (warn_slowpath_common+0x6c/0x88)

[<8002f660>] (warn_slowpath_common) from [<8002f6ac>] (warn_slowpath_fmt+0x30/0x40)

[<8002f6ac>] (warn_slowpath_fmt) from [<805944bc>] (dev_watchdog+0x260/0x26c)

[<805944bc>] (dev_watchdog) from [<80039848>] (call_timer_fn.isra.8+0x24/0x84)

[<80039848>] (call_timer_fn.isra.8) from [<80039a10>] (run_timer_softirq+0x168/0x1ec)

[<80039a10>] (run_timer_softirq) from [<8003368c>] (__do_softirq+0x140/0x244)

[<8003368c>] (__do_softirq) from [<80033a6c>] (irq_exit+0xb8/0xf4)

[<80033a6c>] (irq_exit) from [<8000f990>] (handle_IRQ+0x44/0x90)

[<8000f990>] (handle_IRQ) from [<8000856c>] (gic_handle_irq+0x2c/0x5c)

[<8000856c>] (gic_handle_irq) from [<80012ec0>] (__irq_svc+0x40/0x70)

Exception stack(0x80a11f20 to 0x80a11f68)

1f20: 80a11f68 3b9aca00 7d4a167b 000269c2 bf7250d0 80a1e7c8 7d49a533 000269c2

1f40: 00000000 00000000 80a10000 00000000 00000017 80a11f68 00000009 804b4808

1f60: 000f0013 ffffffff

[<80012ec0>] (__irq_svc) from [<804b4808>] (cpuidle_enter_state+0x50/0xe4)

[<804b4808>] (cpuidle_enter_state) from [<804b4950>] (cpuidle_idle_call+0xb4/0x150)

[<804b4950>] (cpuidle_idle_call) from [<8000fce0>] (arch_cpu_idle+0x8/0x44)

[<8000fce0>] (arch_cpu_idle) from [<8006ab10>] (cpu_startup_entry+0x100/0x14c)

[<8006ab10>] (cpu_startup_entry) from [<809bfb2c>] (start_kernel+0x350/0x35c)

---[ end trace bbaf3f4e344cdb53 ]---

 

I have attached the full log as an attachment which has the dump of the tx ring buffer.

 

Thanks,

Charles

Original Attachment has been moved to: net_timeout_issue.log.zip

Labels (1)
12 Replies

6,221 Views
dipak290485
Contributor I

Hello All,

We are using i.MX8QP with kernel version 5.4. The patch mentioned in https://community.nxp.com/pwmxy87654/attachments/pwmxy87654/imx-processors/85833/1/0001-backport-eth... seems to applied already in the kernel 5.4.

But we are facing the similar issue. Below is the log snippet for reference. I have attached the logs for the same.

[20947.362770] NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
[20947.369128] WARNING: CPU: 5 PID: 0 at net/sched/sch_generic.c:479 dev_watchdog+0x31c/0x328

Could you please share the solution if this issue is solved?

 

Regards

0 Kudos
Reply

10,939 Views
penghaopenghao
Contributor II
In general, useful information might include: - was this preceded by any interface reconfiguration or link changes? - extended network stats (ethtool -S) - MDIO register dump (mii-tool -vv) (if the interface has an MDIO PHY)  Having seen this error many times with different causes, I wrote a short summary for the support team here, which (with some references removed) may be generally useful:  --- The watchdog will fire if all these conditions are met: 1. The interface is up 2. A TX queue is stopped (normally because it is full) 3. No packets have been added to the queue in the last 5 seconds 4. The driver has not told the kernel that the device is unable to transmit now (e.g. link is down).  Conditions 2 and 3 together normally mean that the TX queue has been stopped for 5 seconds and therefore that few packets (not necessarily none at all) have been completed in that time.  The time taken for individual packets to be completed is *not* considered.  This can happen due to: a. Driver bug causing conditions 2 and 4 to be true during reconfiguration b. MAC blocked by a pause frame flood c. IRQ handling is delayed by a long time (can happen due to excessive serial logging) d. Firmware bug causes driver to see link as up when it's not e. Hardware fault (always a possibility)

10,940 Views
yueda
Contributor I

I have the similar problem in MX6UL-EVK, kernel 4.1.15.

To fix it, I copied a file "drivers/net/ethernet/freescale/fec_main.c" from fsl community kernel source.

(GitHub - Freescale/linux-fslc: Linux kernel source tree )

I hope this will help.

0 Kudos
Reply

10,940 Views
jimbrooke
Contributor II

I'm having the same problem with an MX6ul-evk, as well as our custom imx6ul board.

I don't see any differences between the linux-fslc fec_main.c that would contribute to the problem.  Ueda, did this actually fic your problem long-term?  What commit did you grab fec_main.c at?  I'm using the linux-imx.git tree, with the 4.1.15_2.0.0_ga release tag.

Carlos, is there any more information from NXP?  Is there any work-around that can be used? Is there a known cause of these transmit timeouts?   Even a quicker recovery from this problem would be a great help.  

As it stands now, we get Transmit stalls on the order of a few times a day, and it seems that they always occur during heavy TX loads.  It takes about 3 minutes for the kernel/hardware to recover and continue transmission (after clients reconnect, etc).  I'm not sure yet why it takes so long to recover.

Thanks,

0 Kudos
Reply

10,455 Views
Honey
Contributor I

Hi jimbrooke,

   About this eth0 tx tansfer queue timeot issue,  do you get any valid solution or word around at last?

  now we use this i.MX6ULL to develop camera, then always transfer image data to host, this issue can occur random.

 

0 Kudos
Reply

10,940 Views
yueda
Contributor I

I 'm using the attached patch file for yocto.

0 Kudos
Reply

10,940 Views
jimbrooke
Contributor II

This patch doesn't seem to solve the problem at all.  I'm still seeing the same rate of faults.

I have both our custom board and the IMX6UL-EVK runnig the linux-fslc.git tree (https://github.com/Freescale/linux-fslc.git) at the 4.1-2.0.x-imx branch.

10,940 Views
yueda
Contributor I

Please get latest linux-fslc commit version. I used,

branch : 4.1-2.0.x-imx

commit : 80e3b3c3c85a3a8b70ef6403bc806901628c7446

 

I think easy way is update your "meta-freescale" repo. I used

commit : 3335d0902ee31411e09e795583569b2f611adb0d

Thank you.

0 Kudos
Reply

10,940 Views
Carlos_Musich
NXP Employee
NXP Employee

Hi Charles,

this problem actually looks like a specific use case of the errata. The patch you mention makes some improvements in the behavior but it does not correct it, even if you increase the buffer the problem will eventually show up.

This should have been fixed in IMX7 processors. I have checked errata and there is nothing related to this issue.

Sorry for the inconvenience-


Regards,
Carlos

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply

10,940 Views
huyongfa
Contributor II

Hi Carlos:

now i have the same problem: eth0 tx ring dump.

could you point out the ERR id in MX6QDL errdata that match this problem.

many thanks

hu yongfa

0 Kudos
Reply

10,939 Views
charlesung
Contributor III

Thanks for looking into this. By the way, do you mind giving me the errata number for this?

0 Kudos
Reply

10,939 Views
Carlos_Musich
NXP Employee
NXP Employee

Hi Charles,

you can go to www.nxp.com/imx6q > Documentation Tab > Errata

The direct link is next http://www.nxp.com/files/32bit/doc/errata/IMX6DQCE.pdf?fasp=1&WT_TYPE=Errata&WT_VENDOR=FREESCALE&WT_... 

Regards,

Carlos

0 Kudos
Reply