i.mx287 - network problem - no packtes transmitted (sporadic)

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

i.mx287 - network problem - no packtes transmitted (sporadic)

Jump to solution
3,608 Views
christophk
Contributor II

Hello,

we're using a custom board with the i.mx287 based on the i.mx28-evk with debian linux (jessie 8.2) and a kernel based on mainline linux 3.18.0.

Hardware changes

On the i.mx28 evk it is possible to shutdown the ethernet phys (to save energy) with a mosfet, on our cumston board the phys are directly connected to the 3.3 V power supply.

Also we're using different plugs for the ethernet connection (the ones used on the evk are pretty expensive).

Linux Network configuration

  • eth0 is configured via NetworkManager to use dhcp (ipv4) and is connected to our local network (with a windows 2012 server dhcp server).
  • eth1 uses a static network configuration via /etc/network/interfaces and is connected to a laser device which also has a static address (ipv4).

Error Description

There are sporadic problems on both network interfaces.

When the system starts up, it *sometimes* failes to get an ip address on eth0.

Also on eth1 the system *sometimes* fails to communicate with the laser device.

When the error occurs and I try to ping the dhpc server (on eth0) or the laser device (on eth1) the resulting message is "Destination host unreacheable".

The errors on the interfaces are independent of each other, so sometimes the error occurs on eth0 only, sometimes it occurs on eth1 only.

When the interfaces work properly after bootup, they continue to work properly and the problem doesn't appear until the system is rebooted.

All the descriptions below refer to eth0, I haven't done any test on eth1 because that's more difficult (because of the device connected).

Workaround: Re-plug network cable

Usually every time the network cable is removed and plugged in again the problem disappeares (for both interfaces).

Wireshark: No packets from eth0

I've attached a hub in between the eth0 interface and our main switch to look at the network traffic with wireshark. When the problem occurs, there's no package at all transmitted from the network interface. After re-plugging the cable wireshark shows the normal dhcp traffic (and the device eth0 gets an ipv4 address).

Switch vs. Hub

Apparently the behaviour also depends on the device used at the other end of the cable. When using a relatively new Netgear Switch GS108, the problem occurs about 1 of 5 times after bootup. When using the older Netzgear DS108 Hub, the problem occured much more often, about 4 out of 5 times. Also, re-plugging the cable didn't always solve the problem and I had to restart the system (and re-plug the cable).

Checking proper phy reset with an Oscilloscope

We suspected that the phy might be reset by turning on the supply voltage (with the mosfet that was removed on our custom board). I checked the voltage on the the gpio that was originally connected to the mosfet, but there's no change (it's always high) during bootup.

Then I checked the reset pin (phy pin no. 15). It get's properly reset (taken down from 3,3 to 0 V) during bootup for a little more than 100 ms.

Checking with the original evk board

I've not yet been able to check with the original evk because we shipped it to an extern software developer. I've already ordered it back and I'll probable be able to check if the problem also happens with the i.mx28 evk hardware on friday.

Please help.

Thanks, Christoph

Labels (1)
1 Solution
1,395 Views
lategoodbye
Senior Contributor I

Unfortunately i linked to an old version of this patch, but still the second version hasn't been be accepted:

'[PATCH 0/3] net: fec: Reset ethernet PHY whenever the enet_out clock' - MARC

FYI the patch has been created against Kernel 4.4.

Here is our minimal version against Kernel 4.2.5:

net: fec: Reset ethernet PHY whenever the enet_out clock is being ena… · I2SE/linux@852d235 · GitHub

View solution in original post

7 Replies
1,395 Views
christophk
Contributor II

Hey people,

I really need some help on this, any ideas?

0 Kudos
1,395 Views
lategoodbye
Senior Contributor I

Hi Christoph,

is it similar to this problem?

'[PATCH] net: fec: fix enet_out clock handling' - MARC

0 Kudos
1,395 Views
christophk
Contributor II

Hi Stefan,

The patch does not work for kernel versions 3.18.0 or 3.18.25. I tried to apply it by hand, but the difference are too big.

For what kernel version is it?

Cheers

Christoph

0 Kudos
1,396 Views
lategoodbye
Senior Contributor I

Unfortunately i linked to an old version of this patch, but still the second version hasn't been be accepted:

'[PATCH 0/3] net: fec: Reset ethernet PHY whenever the enet_out clock' - MARC

FYI the patch has been created against Kernel 4.4.

Here is our minimal version against Kernel 4.2.5:

net: fec: Reset ethernet PHY whenever the enet_out clock is being ena… · I2SE/linux@852d235 · GitHub

1,395 Views
christophk
Contributor II

I've applied the patch about three weeks ago and have not senn the problem since then.

Works :-)

Thanks!

0 Kudos
1,395 Views
christophk
Contributor II

Hi Stefan,

thanks for the hint.

We've seen this UP/DOWN behaviour on our board some time ago, although it doesn't happen right now.

The phys clock inputs are connected to enet_clk (B4.16). I don't know exactly how the phys are reset (how do I find out?), as I said, it's a debian system with a mainline linux kernel 3.18.0.

I've already checked the reset sequence with an oscilloscope:

2015-12-14_epc-eclk-glitch.png

The purple signal is enet_clk (B4.16) going to phy clkin (pin 5), the cyan/blue signal ist the ~rst of the phy (pin 15), connected to B4.13 on the cpu. The reset happens about 10s after supplying power to the board. As you can see in the upper part of the scope image, the clock is switched off about 20ms after the end of the reset sequence. Reset duration is about 110 ms.

Does the switching off of the clock signal indicate the bug you hinted at? Or ist this a different situation?

Cheers

Christoph

1,395 Views
lategoodbye
Senior Contributor I

The switching off of the clock is not the bug, but it's related to this commit (which causes a regression):

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/net/ethernet/fre...

As you can see this very old (~ Linux 3.14) and not possible to revert easily.