TCP TX stuck with bonding interface

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

TCP TX stuck with bonding interface

226 Views
ederibaucourt
Contributor II

Hello,

We are using the FEC interface of an i.MX8MN as a primary slave in a bonding interface in active_backup mode. We are conducting performance tests using iperf3 as a TCP client. However, when unplugging the cable, the traffic drops to 0Mb/s instead of switching to the backup interface. This does not happen when using the other interface as primary (a LAN78XX), when using UDP, or download TCP traffic (iperf3 -R).

We identified a potential timing error because adding a delay in fec_enet_adjust_link() greatly reduces the occurrence of this error. We noticed that the TCP socket file descriptors are not writable (using poll()). tcp_check_space() is no longer called once the connection is stuck, and the TCP TX space reaches a negative number. We observed the same issue with other TCP TX applications, like scp. Limiting the bandwidth to 40Mb/s also reduces the occurrence.

We are using linux-boundary 5.4.110: boundary-imx_5.4.x_2.3.0: e5cde7b05e548bebb7cc21113505538b98c0984e as part of a Linux BSP.

Our hypothesis is that the TCP TX queue is stuck because of a synchronization error between fec_enet_adjust_link(), fec_enet_interrupt() and napi_complete() within the softirq. We would like to request your assistance in diagnosing and fixing this issue.

Thank you very much,

Regards

Labels (1)
0 Kudos
0 Replies