i.MX Linux FEC Driver Drops IPV6 Multicasts & Promiscuous Setting

TomE · ‎07-16-2015

This applies at least to the i.MX28 and i.MX53. It may apply to the i.MX6 chips as well.

Our products have to use IPV6. The IPV6 Neighbour Discovery requires Ethernet Multicasting to work. Multicasting has been "standard" on Ethernet for over 20 years, so that should work on these chips.

Few of the Freescale Linux FEC drivers prior to Kernel 3.11 seem to handle receiving Multicasts properly. They don't work properly in "Promiscuous" mode either.

This has only recently been fixed on the "mainline", but I'm using Freescale's "imx_2.6.35_maintain" branch, and there are no patches fixing that.

That means that IPV6 (dates from 15 years ago) doesn't work reliably if at all. It also means IPV4 Multicasting (21 years ago) has the same problem. Also promiscuous bridging/monitoring.

Here's what happens

When the link comes up (initially or after the Ethernet cable is unplugged and plugged in again) the code calls "fec_restart()". That wipes out the Multicast filters. When the link comes up, calls to "set_multicast_list()" are made to fill in the filter again. After that has finished, the MII polling code notices the link has come up, calls "fec_enet_adjust_link()" and that calls "fec_restart()" and wipes them out again. So it now can't receive the multicasts. If you're really lucky with exactly the right timing in the /etc/init.d scripts, it might come up in the right order. When it is in the bad state you can type "ifconfig eth0 up", that puts the multicast filters back without restarting it and it now works until the next time the Ethernet cable is unplugged. If you "ifconfig eth0 down"you then have to bring it "up" twice to get multicast reception working again.

Here's the 2.6.35 "fec.c" and (pretty much identical in this function) 3.10 "fec_main.c":

http://lxr.free-electrons.com/source/drivers/net/fec.c?v=2.6.35

http://lxr.free-electrons.com/source/drivers/net/ethernet/freescale/fec_main.c?v=3.10

Apologies for the next line. It looked perfect when I pasted the code in when originally creating this page. The Site only mangled it all into one line after I'd posted it. I've had to have three goes at fixing it up to make it readable since then.

1145 /* This function is called to start or restart the FEC during a link1146  * change.  This only happens when switching between half and full1147  * duplex.1148  */1149 static void1150 fec_restart(struct net_device *dev, int duplex)1151 { ...1161 1162         /* Reset all multicast. */1163         writel(0, fep->hwp + FEC_GRP_HASH_TABLE_HIGH);1164         writel(0, fep->hwp + FEC_GRP_HASH_TABLE_LOW);


1145 /* This function is called to start or restart the FEC during a link
1146  * change.  This only happens when switching between half and full
1147  * duplex.
1148  */
1149 static void
1150 fec_restart(struct net_device *dev, int duplex)
1151 {
 ...
1161 
1162         /* Reset all multicast. */
1163         writel(0, fep->hwp + FEC_GRP_HASH_TABLE_HIGH);
1164         writel(0, fep->hwp + FEC_GRP_HASH_TABLE_LOW);

First of all, the comment on that function is wrong. It may have been called for that originally, but by this version it is also called on link timeout and link state change.

Here's the fixed 3.14 version, still with the bad comment:

491 /* This function is called to start or restart the FEC during a link
492  * change.  This only happens when switching between half and full
493  * duplex.
494  */
495 static void
496 fec_restart(struct net_device *ndev, int duplex)
497 {
...
651         /* Setup multicast filter. */
652         set_multicast_list(ndev);

The driver isn't all that "stable" as there have been at least 102 patches to this source code
file alone since then. One of them (in 3.17 I think) at least fixed the comment:

914 /*
915  * This function is called to start or restart the FEC during a link
916  * change, transmit timeout, or to reconfigure the FEC.  The network
917  * packet processing for this device must be stopped before this call.
918  */
919 static void
920 fec_restart(struct net_device *ndev)
921 {

This problem was documented and patched here:

http://kernel.opensuse.org/cgit/kernel/commit/?id=772e42b07fbdc650206746e00cb2914e362594a3

That fix was buggy, as complained about here (wiping out the Promiscuous setting for bridges and packet monitoring):

http://permalink.gmane.org/gmane.linux.network/307446

Then patched again here:

http://kernel.opensuse.org/cgit/kernel/commit/?id=84fe61821e4ebab6322eeae3f3c27f77f0031978

Update. I've found a patch, but only on Freescale's "imx_3.0.35_4.1.0" branch. This one fixes the "Multicast" problem without fixing the "Promiscuous" problem:

Author: Fugang Duan <B38611@freescale.com> 2013-11-26 17:18:36
Committer: Jason Liu <r64343@freescale.com> 2014-01-24 13:33:26
Parent: 76f9b385fe7d4602f3157d309f4ba793ac13bb36 (ENGR00291667-02 net:fec_ptp: fix the potential issue for storing timestamp)
Branch: remotes/origin/remote/freescale/imx_3.0.35_4.1.0

    ENGR00291667-01 net:fec: reinit multicast address when fec restart

    Ptp multicast packet receive does not work after Ethernet link is lost
    for a short time and then reconnected again. Because fec call restart()
    to reset all multicast when cable hotplug.
    (cherry picked from commit adfa64f0c2bf35f8b902ae5700f97e7e11ae1794)

Tom

stefano_cappa_k · ‎12-18-2018

Hi!

I know that this question is very old, but I want to ask only if this is still a problem with iMX6 with kernel 4.9.xxxx for instance on yocto sumo.

How can I know if ipv6 Multicast DNS is working properly on my iMX6 ULL?

I'm having some problems with mDNS with ipv6, so I'm asking here to try to isolate all possibile alternatives and search in the right place to fix my issue.

Thank you.

TomE · ‎12-18-2018

The problem I reported happened if the Ethernet cable was unplugged and then plugged back in again. The Ethernet code that handled "Link Down" didn't handle the multicast filters properly, so the filters that were meant to be receiving IPV6 Multicasts got cleared.

These fixes went into the mainline at 3.10 and 3.14. It is unlikely the bugs have been reinstated between there and 4.9.

If you have this, then running after a cable disconnect should be different to booting with the cable connected.

There's plenty of other things to go wrong with mDNS. Have it try and register something and see if you can see it with "avahi-browse" or some other mDNS checking tool.

Tom

stefano_cappa_k · ‎12-20-2018

Hi TomE and gusarambula !‌

I fixed my previous error and now I'm having EXACTLY the issue reported by Tom Evans about cable unplugging/plugging.

If I boot my device with eth cable plugged in I can do this:

~ # ping6 ff02::fb
PING ff02::fb (ff02::fb): 56 data bytes
64 bytes from fe80::3ab1:9eff:fe10:6: seq=0 ttl=64 time=0.930 ms
64 bytes from fe80::18af:6ac0:1aee:2838: seq=0 ttl=64 time=2.253 ms (DUP!)
and so on...‍‍‍‍‍‍‍‍‍‍‍‍‍

but when I unplug and then plug again ethernet cable the result is:

~ # ping6 ff02::fb
PING ff02::fb (ff02::fb): 56 data bytes
ping6: sendto: Network is unreachable‍‍‍‍‍‍

The only solution is a reboot.

I'm using Yocto sumo with kernel 4.9.88.

Do you have any suggestions of how to patch kernel 4.9.88?

Thank you.

stefano_cappa_k · ‎01-09-2019

fixed, my problem wasn't FEC driver, but kernel configuration.

fabio_estevam · ‎01-16-2019

What exactly was the problem? It would be nice if you could report the details at the netdev mailing list.

stefano_cappa_k · ‎01-16-2019

I already talked on netdev mailing list, but the problem was another one and I posted the solution here

https://community.nxp.com/message/1099348?commentID=1099348#comment-1099348

fabio_estevam · ‎01-16-2019

The thread in the netdev list remains open, so that's why I suggest you to post the solution there.

Also, you could send a patch adding the defconfig options to imx_v6_v7_defconfig.

stefano_cappa_k · ‎12-19-2018

Using an older version of mDNSResponder I get a similar error:

mDNS_RegisterInterface: Error! Tried to register a NetworkInterfaceInfo fe80::3ab1:9eff:fe10:6 with invalid mask 0000:0000:0000:0000:0000:0000:0000:0000

stefano_cappa_k · ‎12-19-2018

Ok, so the problem is different but It can be related in some way.

With DNS-SD and only with an ipv6 address assigned (no IPv4) I'm able to register a service without issues, but I cannot send mdns packets on the lan (I checked with wireshark). Obviously with IPv4 everything is OK.

With ipv6 I only see icmpv6 packets. All mDNS are missing.

If I run the official Apple's mdnsresponder (latest version) I see an error while attaching to ipv6 interface (the mask is wrong):

~ # mdnsd -debug

mDNSResponder (Engineering Build) (Dec 10 2018 13:10:41) starting

setsockopt - SO_RECV_ANYIF: Protocol not available

mDNS_RegisterInterface: Error! Tried to register a NetworkInterfaceInfo fe80::3ab1:9eff:fe10:6 with invalid mask ::

lategoodbye · ‎07-18-2015

May i ask why you need to use the 2.6.35 downstream kernel for i.MX28 instead of the Mainline kernel?

According to your problem:

https://github.com/qca/qca7000/blob/master/qca-linux-2.6.35.3-imx.patch

This is a patch series for a MX28 board from Qualcomm. Please take a look Patch #3. This should fix your problem.

TomE · ‎07-18-2015

> May i ask why you need to use the 2.6.35 downstream kernel for i.MX28 instead of the Mainline kernel?

Sorry, I wasn't clear. We're using the i.MX53, and I originally wrote the post with that in the Subject. Then I realized the problem applied to far more i.MX chips than the one we're using, so I expanded the subject to cover all the affected ones.

Why aren't we using the Mainline on the i.MX53? Because the i.MX53 isn't supported in the Mainline, and I'm guessing will never be. Maybe the i.MX28 is luckier in this regard.

When I say "supported" I don't mean the CPU Core, memory interfaces and the common stuff like the Ethernet [1], USN and UARTs. They work, obviously. But if all our product required was a CPU Core we wouldn't have selected these chips, where the "M" in "i.MX" stands for Multimedia.

The i.MX5x (as the i.MX51) has been around since 2009 and we started with the i.MX53 in 2012, when it was already a bit old for a new product (I have i.MX53 documents dated 2010). The mainline Linux at that time (circa 3.4) didn't even support LVDS output, and had nothing able to support simple video input. It is now about 5 years into the product cycle and the mainline STILL doesn't support the video/VPU/GPU/ICS/IPU hardware. All the current effort looks to be on getting the video input and output hardware in the i.MX6 chips working (and they've been out for years), but some of the patch sets I'm seeing there are i.MX6 specific and don't support the i.MX53.

As another complication, the mainline doesn't support the i.MX53 NAND controller. It doesn't play nice with 2k page chips and needs patching. So does the JFFS2 driver to run on that controller. Freescale patched these years ago in its branch, but they're not in the mainline, and our supplier dropped their support for these patches in 3.4, so we're stuck.

Compared with that, Freescale have always released old kernels with over 1000 Freescale-specific patches applied to make their hardware work. They then reapply those same patches to each new kernel while they were tracking the releases. Just clone "linux-2.6-imx.git - Freescale i.MX Linux Tree" and do "gitk --all" to see how they've been managing it.

They moved to 3.0 for the i.MX6, but stayed with 2.6.35 for the i.MX53, and haven't updated that tree in years. Why aren't they frantically tracking the mainline? Because with over 1000 patches to apply that would take an enormous amount of work, as various people keep completely changing the way the kernel does things, and moving and renaming source files (from "fec.c" to "fec-main.c" and so on), and that makes reapplying patches hugely difficult. It wouldn't help their existing customers all that much. If the embedded product you're building works with a 2010 version of Linux, then why try to change it in the field with all the risks that entails?

Note 1: Not a good example, being the subject of this thread.

Tom

gusarambula · ‎08-12-2015

Hello Tom Evans,

When the i.MX5 chip was developed there is not linux version 3.0.35 released which suppored Multicasts and Promiscuous; so the 2.6.35 kernel was used, which not support Multicasts and Promiscuous for code baseline. By the development of the i.MX6 chip linux 3.0.35 released.

You can cherry-pick the patch and changes from version 3.0.35 to you 2.6.36 for i.mx28/i.mx5 but at this time Freescale have no these on the roadmap.

i.MX Linux FEC Driver Drops IPV6 Multicasts & Promiscuous Setting

i.MX Linux FEC Driver Drops IPV6 Multicasts & Promiscuous Setting

i.MX2x

i.MX50

i.MX51

i.MX53

i.MX6_All

Linux