On our imx6q board running kernel 3.14.52-1.1.0_ga (commit 5f6f0a5), we have encountered a case that the transmit queue timing out on eth0. I did a search on the web and found a thread on the boundary forum that is very similar to this but it was more than 3 years ago and the patch was done on a much older kernel. So I just want to know is there something wrong in the fec driver that is causing this? If so, is there a patch available that can fix this problem?
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at /yocto/iteris-2.0/build/tmp/work-shared/ccu6/kernel-source/net/sched/sch_generic.c:264 dev_watchdog+0x260/0x26c()
NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
Modules linked in: loadfpga(O) virtual_fb(O) mxc_v4l2_capture ipu_bg_overlay_sdc ipu_still ipu_prp_enc ipu_csi_enc tvp5147(O) v4l2_int_device ipu_fg_overlay_sdc evbug mxc_dcic galcore(O)
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.14.52-1.1.0_ga+g5f6f0a5 #1
[<800158e4>] (unwind_backtrace) from [<800123d4>] (show_stack+0x10/0x14)
[<800123d4>] (show_stack) from [<806eed4c>] (dump_stack+0x7c/0xbc)
[<806eed4c>] (dump_stack) from [<8002f660>] (warn_slowpath_common+0x6c/0x88)
[<8002f660>] (warn_slowpath_common) from [<8002f6ac>] (warn_slowpath_fmt+0x30/0x40)
[<8002f6ac>] (warn_slowpath_fmt) from [<805944bc>] (dev_watchdog+0x260/0x26c)
[<805944bc>] (dev_watchdog) from [<80039848>] (call_timer_fn.isra.8+0x24/0x84)
[<80039848>] (call_timer_fn.isra.8) from [<80039a10>] (run_timer_softirq+0x168/0x1ec)
[<80039a10>] (run_timer_softirq) from [<8003368c>] (__do_softirq+0x140/0x244)
[<8003368c>] (__do_softirq) from [<80033a6c>] (irq_exit+0xb8/0xf4)
[<80033a6c>] (irq_exit) from [<8000f990>] (handle_IRQ+0x44/0x90)
[<8000f990>] (handle_IRQ) from [<8000856c>] (gic_handle_irq+0x2c/0x5c)
[<8000856c>] (gic_handle_irq) from [<80012ec0>] (__irq_svc+0x40/0x70)
Exception stack(0x80a11f20 to 0x80a11f68)
1f20: 80a11f68 3b9aca00 7d4a167b 000269c2 bf7250d0 80a1e7c8 7d49a533 000269c2
1f40: 00000000 00000000 80a10000 00000000 00000017 80a11f68 00000009 804b4808
1f60: 000f0013 ffffffff
[<80012ec0>] (__irq_svc) from [<804b4808>] (cpuidle_enter_state+0x50/0xe4)
[<804b4808>] (cpuidle_enter_state) from [<804b4950>] (cpuidle_idle_call+0xb4/0x150)
[<804b4950>] (cpuidle_idle_call) from [<8000fce0>] (arch_cpu_idle+0x8/0x44)
[<8000fce0>] (arch_cpu_idle) from [<8006ab10>] (cpu_startup_entry+0x100/0x14c)
[<8006ab10>] (cpu_startup_entry) from [<809bfb2c>] (start_kernel+0x350/0x35c)
---[ end trace bbaf3f4e344cdb53 ]---
I have attached the full log as an attachment which has the dump of the tx ring buffer.
Thanks,
Charles
Original Attachment has been moved to: net_timeout_issue.log.zip
Hello All,
We are using i.MX8QP with kernel version 5.4. The patch mentioned in https://community.nxp.com/pwmxy87654/attachments/pwmxy87654/imx-processors/85833/1/0001-backport-eth... seems to applied already in the kernel 5.4.
But we are facing the similar issue. Below is the log snippet for reference. I have attached the logs for the same.
[20947.362770] NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
[20947.369128] WARNING: CPU: 5 PID: 0 at net/sched/sch_generic.c:479 dev_watchdog+0x31c/0x328
Could you please share the solution if this issue is solved?
Regards
In general, useful information might include: - was this preceded by any interface reconfiguration or link changes? - extended network stats (ethtool -S) - MDIO register dump (mii-tool -vv) (if the interface has an MDIO PHY) Having seen this error many times with different causes, I wrote a short summary for the support team here, which (with some references removed) may be generally useful: --- The watchdog will fire if all these conditions are met: 1. The interface is up 2. A TX queue is stopped (normally because it is full) 3. No packets have been added to the queue in the last 5 seconds 4. The driver has not told the kernel that the device is unable to transmit now (e.g. link is down). Conditions 2 and 3 together normally mean that the TX queue has been stopped for 5 seconds and therefore that few packets (not necessarily none at all) have been completed in that time. The time taken for individual packets to be completed is *not* considered. This can happen due to: a. Driver bug causing conditions 2 and 4 to be true during reconfiguration b. MAC blocked by a pause frame flood c. IRQ handling is delayed by a long time (can happen due to excessive serial logging) d. Firmware bug causes driver to see link as up when it's not e. Hardware fault (always a possibility)
I have the similar problem in MX6UL-EVK, kernel 4.1.15.
To fix it, I copied a file "drivers/net/ethernet/freescale/fec_main.c" from fsl community kernel source.
(GitHub - Freescale/linux-fslc: Linux kernel source tree )
I hope this will help.
I'm having the same problem with an MX6ul-evk, as well as our custom imx6ul board.
I don't see any differences between the linux-fslc fec_main.c that would contribute to the problem. Ueda, did this actually fic your problem long-term? What commit did you grab fec_main.c at? I'm using the linux-imx.git tree, with the 4.1.15_2.0.0_ga release tag.
Carlos, is there any more information from NXP? Is there any work-around that can be used? Is there a known cause of these transmit timeouts? Even a quicker recovery from this problem would be a great help.
As it stands now, we get Transmit stalls on the order of a few times a day, and it seems that they always occur during heavy TX loads. It takes about 3 minutes for the kernel/hardware to recover and continue transmission (after clients reconnect, etc). I'm not sure yet why it takes so long to recover.
Thanks,
Hi jimbrooke,
About this eth0 tx tansfer queue timeot issue, do you get any valid solution or word around at last?
now we use this i.MX6ULL to develop camera, then always transfer image data to host, this issue can occur random.
This patch doesn't seem to solve the problem at all. I'm still seeing the same rate of faults.
I have both our custom board and the IMX6UL-EVK runnig the linux-fslc.git tree (https://github.com/Freescale/linux-fslc.git) at the 4.1-2.0.x-imx branch.
Please get latest linux-fslc commit version. I used,
branch : 4.1-2.0.x-imx
commit : 80e3b3c3c85a3a8b70ef6403bc806901628c7446
I think easy way is update your "meta-freescale" repo. I used
commit : 3335d0902ee31411e09e795583569b2f611adb0d
Thank you.
Hi Charles,
this problem actually looks like a specific use case of the errata. The patch you mention makes some improvements in the behavior but it does not correct it, even if you increase the buffer the problem will eventually show up.
This should have been fixed in IMX7 processors. I have checked errata and there is nothing related to this issue.
Sorry for the inconvenience-
Regards,
Carlos
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi Carlos:
now i have the same problem: eth0 tx ring dump.
could you point out the ERR id in MX6QDL errdata that match this problem.
many thanks
hu yongfa
Thanks for looking into this. By the way, do you mind giving me the errata number for this?
Hi Charles,
you can go to www.nxp.com/imx6q > Documentation Tab > Errata
The direct link is next http://www.nxp.com/files/32bit/doc/errata/IMX6DQCE.pdf?fasp=1&WT_TYPE=Errata&WT_VENDOR=FREESCALE&WT_...
Regards,
Carlos