i.MX6 FEC stops generating receive interrupts

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

i.MX6 FEC stops generating receive interrupts

8,700 Views
jacconbastiaans
Contributor III

We are seeing strange ethernet behavior on the i.MX6.

In our networking setup, we connect a Linux PC and a SabreSD i.MX6 board using a switch (see the following picture)

+----------+

| Ethernet  |
|switch   |

+----------+

  |      |

  |      |

  |      |

+----------+    +-------------+

| Linux PC |    | i.MX6 board |

+----------+    +-------------+

Both network links are 1Gbit full duplex.

The SabreSD board runs mainline Linux 3.10.9 with the PREEMPT_RT patches + FEC driver from 3.14).
On Linux we have started the iperf server (network stress test tool) in UDP mode (command used: “iperf –s –u”).

The Linux PC runs Ubuntu 12. On this PC we:

  - ping the i.MX6 board every second

  - run the iperf client in UDP mode (command used: “while [ 1 ]; do iperf –c ‘IP address of SabreSD board’ -u -b 100m -t 30 -l 256;sleep 1;done”)

After a while (most of the time a couple of minutes), we see that the i.MX6 board stops replying to the pings from the Linux PC. Closer inspection shows that the FEC in the i.MX6 doesn’t generate receive frame interrupts anymore. The attached screenshots show that the RXF interrupt is enabled. We know that the FEC is still receiving Ethernet frames (because we see the event counters related to frame reception increasing), but no receive frame interrupts are being generated (only MII interrupts).

If you look in the RDAR register in the screenshots you see that it has the value 0 meaning that the FEC cannot write the received frame in main memory because of a lack of available free receive descriptors.

When we ping the Linux PC from the SabreSD board, we see the receive interrupts on the i.MX6 start occurring again. After studying the FEC driver we see that the driver also empties the receive buffer when handling a transmit interrupt. So it seems that the FEC starts generating receive interrupts again when free receive descriptors are available again.  Our question now is: how can the FEC get into state where it stops generating receive interrupts and how can we prevent this?

ethernet_bug1.jpgethernet_bug2.jpg

Labels (2)
19 Replies

4,201 Views
zhongxing_zhao
Contributor I

Hi all!

Recently, i get same problem in imx6q, under running in linux v4.1.6, the board cannot receive any packet but can send arp packet to network. After  reading replies , i patches with patch from that https://lkml.org/lkml/2016/11/17/945

After that, the problem is gone, but another question is still exist that receiver using udp to receive packet fronm sender with the frequency about 4ms per packet,  sometimes receiver has not got any packets from sender in some moment, which happened last one second or two ,even more longer.

And so, after patched the patches of  https://lkml.org/lkml/2016/11/17/945,though the G-net workes  never be hang, but the value of IEEE_rx_macerr  is increasing, so i suspected that the patch is working but not find the root cause of the problem.

menu.saveimg.savepath20190325162408.jpg

any suggestion will be appreciated.

0 Kudos

4,201 Views
andreasstarzer
Contributor I

Dear *!

Same problem here ... (i.MX6q with 3.10.y Kernel + RT patch)

Did you find any solution to this problem?

I did some testing and at my side clearing the RDAR register did not help.

I made a test driver which starts an observer-timer after napi_complete/enable-interrupt to detect thus hangs after 10ms.

In case of error, I start the whole receive handling which reads the rx ring and clears the RDAR.

This basically recovers from the error state but sometimes it needs to be called about 100 times (~1sec) until the interrupts start working again.

Currently I'm thinking of a chip bug regarding small-packet-receive-flood which permanently fills the rx ring ...

Did we find a new FEC-MAC chip errata? Or is there a problem in the read/clear sequence of the driver?

Actual kernel does not have major change in the driver read sequence (mainline: ~4.9.y / freescale imx 4.1.y).

Is another workaround possible?

It would be nice to prevent the error, instead of recovering from it!

Please Help!

Best Regards

4,201 Views
fabio_estevam
NXP Employee
NXP Employee

Hi Andreas,

Could you try the latest mainline kernel with this patch applied?

https://lkml.org/lkml/2016/11/17/945 

0 Kudos

1,956 Views
庞磊
Contributor I
Hi~ In kernel 4.1.15, the same problem is encountered, what can be changed?
0 Kudos

4,201 Views
mustafakiyar
Contributor I

I have the same trouble with Enet controller. In my condition TX & MII interrupts are alive but somehow RX is dead. I suspected from the RX Overrun, and your observations about RDAR register indicates there is such possibility. At first sight I thought that RX Overrun handling mechanism is problematic.

Jaccon Bastiaansen

Thanks in advance.

0 Kudos

4,201 Views
sedat_altun
Contributor III

Hi Mr. Mustafa,

I have the same problem with the FEC on imx6q with the kernel from imx git 3.10.53.

Did you find any solution to this problem . Any help is appreciated.

My problem is:

On heavy traffic after a while the fec stops receiving the frames. Transmit is working .

During hang  RDAR register is 0 which means no rx ring for the FEC to put incoming frames. When I change the value from 0 to 1 manually the receive is restarting.

Best regards

0 Kudos

4,201 Views
jacconbastiaans
Contributor III

Hello Mustafa, Fabio,

I tried the patches from Russel and unfortunately, I still see the same problems.

The only difference in the test setup is the kernel version. In my first post I mentioned the use of the 3.10.9 kernel with the PREEMPT_RT patches. Now I used the kernel from Russel (3.15.0-rc1+) without PREEMP_RT patches (also because these patches are not available for the 3.15 kernel).

The two debugger screenshots show the FEC registers when the FEC is in the error mode. You see the IEEE_R_FRAME_OK event counter increasing which indicates that frames are received. But the only interrupt being generated is the MII interrupt. I put a breakpoint on the fec_enet_interrupt() function and every time the breakpoint is hit, the FEC ENET_EIR register has the value 0x00800000. You also see the IEEE_R_MACERR event counter increasing which shows that receive FIFO overflows are happening. In my view this is consistent with the RDAR register being 0 (the MAC receives frames but cannot write them to main memory because there are no empty descriptors in the receive ring).

When the FEC has stopped generating receive interrupts, I used the ping command on the Sabre board to send some frames from the Sabre board to the Linux PC. The FEC then starts to generate receive interrupts again. The driver source code shows that the receive buffer is also emptied when a transmit interrupt occurs. So it seems that the FEC stops generating receive interrupts when frames are received and there are no more empty descriptors.

ethernet_bug1.jpg

ethernet_bug2.jpg

Regards,

  Jaccon

0 Kudos

4,201 Views
Yuri
NXP Employee
NXP Employee

I am under impression that the problem relates to the performance erratum
“ERR004512 ENET: 1 Gb Ethernet MAC (ENET) system limitation” :

===

The theoretical maximum performance of 1 Gbps ENET is limited to 470 Mbps (total for Tx and  Rx).

Workarounds:

There is no workaround for the throughput limitation. To prevent overrun of the ENET RX FIFO,

enable pause frame.

===


Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

4,201 Views
jacconbastiaans
Contributor III


Hello Yuri,

This isn't a througput problem. The receiver simply stops generating receive interrupts, even when the amount of incoming data is way below the 470Mbps.

Regards,

  Jaccon 

0 Kudos

4,201 Views
Yuri
NXP Employee
NXP Employee

Jaccon Bastiaansen wrote :

“So it seems that the FEC stops generating receive interrupts when frames

are received and there are no more empty descriptors”. This is correct behavior,

since it is needed to have a free buffer descriptor with INT bit set.

0 Kudos

4,203 Views
jacconbastiaans
Contributor III

Hello Yuri,

  

I expect that the FEC will still generate an interrupt, because the receive ring is completely full. How can we make the FEC generate a receive interrupt when the receive ring is completely full?

  

Regards,

  Jaccon

0 Kudos

4,201 Views
Yuri
NXP Employee
NXP Employee

There are two receiver related events :

Receive Frame Interrupt (ENET_EIR[RXF]) and Receive Buffer Interrupt (ENET_EIR[RXB]).

So, it is possible to get receive interrupts either for every buffer, which is not the last in the frame,

or for whole frame (last buffer is filled).   According to note in section 23.4.1 [Interrupt Event Register

(ENET_EIR)] of the i.MX6 DQ Reference Manual, “TxBD[INT] and RxBD[INT] must be set to 1 to allow

setting the corresponding EIR register flags …”.   To select which receiver interrupts should be served

mask bits in  Interrupt Mask Register (ENET_EIMR) are used. This is programmer's responsibility

to provide the required number of buffers (Buffer Descriptors) in order to avoid data lost.


Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

4,200 Views
jacconbastiaans
Contributor III

Hello Yuri,

It seems like we are not understanding each other.

In our case, we assume we have configured the FEC correctly for sending and receiving frames. You can see debugger screenshots showing the FEC register in my post of the 29th of April. Please tell me if there is anything wrong in this configuration.

Our driver sometimes needs to disable the RXF interrupt. We end up in a situation where no free receive buffers available and new frames are still being received. These new frames are dropped (because of the lack of free receive buffers). The fact that these frames are dropped is something we understand. Higher protocol layers can handle the frame drop. But when we, in this situation, enable the RXF interrupt again, no interrupt is generated! I would expect that the FEC still would generate a receive interrupt, because frames have been received. If no interrupt is generated, the driver will never read the receive buffers again.

I hope this makes our problem clear.

Regard,

  Jaccon

0 Kudos

4,201 Views
Yuri
NXP Employee
NXP Employee

It is needed to use Graceful stop, mentioned in section 23.5.9.4 (Graceful stop) of the i.MX6 DQ Reference

Manual.

0 Kudos

4,201 Views
jacconbastiaans
Contributor III

Hello Yuri,

If I do a graceful stop, how do I get the FEC running again? I couldn't find this information in the reference manual.

Regards,

  Jaccon

0 Kudos

4,201 Views
Yuri
NXP Employee
NXP Employee

"MAC is placed in Sleep mode either by the software or the processor is in Stop mode)."

So, please refer to section  23.5.7.1 (Sleep mode) how to enter the Sleep mode.

Next, according to section 23.5.7.3 (Wakeup), ENETn_ECR[SLEEP] should be cleared

to resume normal operation of the MAC.

0 Kudos

4,201 Views
jacconbastiaans
Contributor III

Hello All,

I have been adding debug code to the Linux FEC driver to get some more insight in this issue.

What I did was: read the RDAR register value immediately after enables the driver enables the RXF interrupt. When this value is zero (meaning that it is quite likely that we enabled the RXF interrupt when the RX ring was full), I try to empty a number of RX ring buffers before writing to RDAR again. In this way I want to get the FEC to start receiving frames again.

But what I see is that the RX ring still contains empty buffers (where the E bit in the buffer descriptor is 1). So I have a situation where the RDAR register is zero and the RX ring contains buffers where the E bit in the buffer descriptor is 1!

When reading the description of the RDAR register in the reference manual, it seems to me that this shouldn’t happen. Or is the RDAR register already set to zero when the FEC
is still filling the last empty RX ring buffer?

Regards,

  Jaccon

0 Kudos

4,201 Views
mustafakiyar
Contributor I

Yes, I am using 100mbps mode also and there is no high network load. But

again, Preventing Rx overrun may be beneficial.

On May 5, 2014 12:57 PM, "Jaccon Bastiaansen" <admin@community.freescale.com>

0 Kudos

4,201 Views
fabio_estevam
NXP Employee
NXP Employee

Could you try the latest Russell's FEC patches available at?

http://ftp.arm.linux.org.uk/cgit/linux-arm.git/log/?h=fec-testing