How to improve IMX7 FEC driver performance

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to improve IMX7 FEC driver performance

1,327 Views
dannyvaiselberg
Contributor I

Hi,

I am got very low packet per second performance (using 64B packet) with IMX7 with FEC Ethernet driver, Any traffic rate above 5% of 1GE exhibited packet drops and excessive softirqs (>70%):, see iperf results below:

iperf3.exe -c 10.0.0.195 -i 1 -l 64
....
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec sender
[ 4] 0.00-10.00 sec 53.4 MBytes 44.8 Mbits/sec receiver

Using LInux version based on 4.9.11 and setting a bridge between eth0 to eth1 and measured short packet (68B) performance between them.
Any traffic rate above 5% of 1GE exhibited packet drops and excessive softirqs (>70%):

top - 13:50:46 up 8 min, 1 user, load average: 0.73, 0.29, 0.10
Tasks: 110 total, 2 running, 108 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.4 sy, 0.0 ni, 56.8 id, 0.0 wa, 0.0 hi, 42.8 si, 0.0 st
KiB Mem : 1023124 total, 890160 free, 56144 used, 76820 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 932920 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3 root 20 0 0 0 0 R 76.7 0.0 1:33.28 KSOFTIRQD/0
673 root 20 0 0 0 0 S 0.3 0.0 0:00.78 kworker/0:1
674 root 20 0 6524 2632 2200 R 0.3 0.3 0:00.90 top

Using cat /proc/interrupts i understand hw interrupts only go to core 0 and only from queue 2:
CPU0 CPU1
59: 0 0 GPCV2 118 Edge 30be0000.ethernet
60: 0 0 GPCV2 119 Edge 30be0000.ethernet
61: 32429 0 GPCV2 120 Edge 30be0000.ethernet
62: 31 0 GPCV2 42 Edge 30b20000.usb
63: 0 0 GPCV2 100 Edge 30bf0000.ethernet
64: 0 0 GPCV2 101 Edge 30bf0000.ethernet
65: 276862 0 GPCV2 102 Edge 30bf0000.ethernet
.
Using perf_4.15 report i found the must of the time is spent on "fec_enet_rx_napi"
Samples: 20K of event 'cpu-clock', Event count (approx.): 5173250000
Overhead Command Shared Object Symbol
34.42% swapper [kernel.kallsyms] [k] cpuidle_enter_state
14.64% ksoftirqd/0 [kernel.kallsyms] [k] fec_enet_rx_napi
4.41% ksoftirqd/0 [kernel.kallsyms] [k] fec_enet_start_xmit

Using perf_4.15 sched latency you see the must of the time is spent ksoftirq instead of kworker:

-------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum
-------------------------------------------------------------------------------
kworker/0:1:673 | 26.999 ms | 65 | avg: 0.518 ms | max:
kworker/u4:0:6 | 0.443 ms | 3 | avg: 0.181 ms | max:
kworker/0:0:1346 | 0.166 ms | 3 | avg: 0.145 ms | max:
jbd2/mmcblk2p2-:113 | 0.473 ms | 3 | avg: 0.034 ms | max:
kthreadd:(2) | 0.200 ms | 5 | avg: 0.024 ms | max:
rcu_preempt:7 | 0.175 ms | 6 | avg: 0.018 ms | max:
kworker/1:1:1367 | 1.532 ms | 63 | avg: 0.017 ms | max:
ksoftirqd/0:3 | 5749.556 ms | 349 | avg: 0.016 ms | max:
mmcqd/2:100 | 2.472 ms | 6 | avg: 0.016 ms | max:
perf_4.15:1371 | 2.953 ms | 1 | avg: 0.014 ms | max:
ksoftirqd/1:15 | 0.073 ms | 3 | avg: 0.012 ms | max:
kworker/1:1H:112 | 0.054 ms | 2 | avg: 0.010 ms | max:
kworker/1:3:693 | 0.207 ms | 4 | avg: 0.004 ms | max:
-------------------------------------------------------------------------------
TOTAL: | 5785.300 ms | 513 |
---------------------------------------------------

The packet drops are identified only in rx section and not in tx section.

I have set the cpu affinity of the RX and TX queues to 3 so both CPUs handle the softirqs but achieved only 7% of full 1GE rate.

any other sugesettions?

Danny

Labels (3)
0 Kudos
3 Replies

927 Views
dannyvaiselberg
Contributor I

Hi Artur,

I am using NXP IMX7D chip with Compulab CL-SOM-iMX7 SOM together with custom board design in house or with their evaluation board, same performance issue with L4.1.15 and with L4.9.11. Linux system details are available at http://www.compulab.com/wp-content/uploads/2018/03/cl-som-imx7_linux_2018-03-27.zip

As the MAC is internal to chip and only external PHY (AR8033) is added  I really think it is bottom half of chip driver (FEC) is not optimized for performance. the PPS (packet per second) with short packets and for the long packets is about the same.

As in a standard network there are many short (<100B) packets the short packet performance is of an issue.

Danny

0 Kudos

927 Views
art
NXP Employee
NXP Employee

The issue you describe is not the known one.

What i.MX7D-based hardware do you use? Is it the i.MX7D SABRE SD board by NXP or your own custom hardware? What Linux package/build do you use? Is it the L4.1.15 BSP by NXP or some custom build? Does the issue occur only with short (64 bytes) packets or with larger packets too?

Please specify.


Have a great day,
Artur

0 Kudos

927 Views
dannyvaiselberg
Contributor I

Hi Artur,

Finally i got my hands on i.MX7D SABRE SD board by NXP with L4.1.15 BSP by NXP and redo the iperf test with the build-in iperf function, as expected the results are the same: very low performance with short (64 bytes) UDP packets.

please see results below:

root@imx7dsabresd:~# iperf -c 10.0.0.177 -l64 -u -b35M
------------------------------------------------------------
Client connecting to 10.0.0.177, UDP port 5001
Sending 64 byte datagrams
UDP buffer size: 160 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.170 port 37709 connected with 10.0.0.177 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 41.2 MBytes 34.6 Mbits/sec
[ 3] Sent 675597 datagrams
[ 3] Server Report:
[ 3] 0.0- 9.9 sec 41.2 MBytes 34.7 Mbits/sec 0.010 ms 1188/675596 (0.18%)
[ 3] 0.0- 9.9 sec 1 datagrams received out-of-order
root@imx7dsabresd:~# iperf -c 10.0.0.177 -l64 -u -b40M
------------------------------------------------------------
Client connecting to 10.0.0.177, UDP port 5001
Sending 64 byte datagrams
UDP buffer size: 160 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.170 port 45928 connected with 10.0.0.177 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 41.5 MBytes 34.8 Mbits/sec
[ 3] Sent 680648 datagrams
[ 3] Server Report:
[ 3] 0.0- 9.9 sec 41.5 MBytes 35.0 Mbits/sec 0.012 ms 1215/680647 (0.18%)
[ 3] 0.0- 9.9 sec 1 datagrams received out-of-order
root@imx7dsabresd:~# iperf -c 10.0.0.177 -l64 -u -b40M
------------------------------------------------------------
Client connecting to 10.0.0.177, UDP port 5001
Sending 64 byte datagrams
UDP buffer size: 160 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.170 port 38370 connected with 10.0.0.177 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 41.5 MBytes 34.8 Mbits/sec
[ 3] Sent 679122 datagrams
[ 3] Server Report:
[ 3] 0.0-10.0 sec 41.1 MBytes 34.5 Mbits/sec 0.018 ms 6124/679121 (0.9%)
[ 3] 0.0-10.0 sec 1 datagrams received out-of-order
root@imx7dsabresd:~# iperf -v
iperf version 2.0.5 (08 Jul 2010) pthreads
root@imx7dsabresd:~# uname -a
Linux imx7dsabresd 4.1.15-1.2.0+g77f6154 #1 SMP PREEMPT Thu Jun 30 05:39:53 CDT 2016 armv7l GNU/Linux

0 Kudos