AnsweredAssumed Answered

How to improve IMX7 FEC driver performance

Question asked by Danny Vaiselberg on May 13, 2018

Hi,

 

I am got very low packet per second performance (using 64B packet) with IMX7 with FEC Ethernet driver, Any traffic rate above 5% of 1GE exhibited packet drops and excessive softirqs (>70%):, see iperf results below:

 

iperf3.exe -c 10.0.0.195 -i 1 -l 64
....
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec sender
[ 4] 0.00-10.00 sec 53.4 MBytes 44.8 Mbits/sec receiver

Using LInux version based on 4.9.11 and setting a bridge between eth0 to eth1 and measured short packet (68B) performance between them.
Any traffic rate above 5% of 1GE exhibited packet drops and excessive softirqs (>70%):

 

top - 13:50:46 up 8 min, 1 user, load average: 0.73, 0.29, 0.10
Tasks: 110 total, 2 running, 108 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.4 sy, 0.0 ni, 56.8 id, 0.0 wa, 0.0 hi, 42.8 si, 0.0 st
KiB Mem : 1023124 total, 890160 free, 56144 used, 76820 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 932920 avail Mem

 

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3 root 20 0 0 0 0 R 76.7 0.0 1:33.28 KSOFTIRQD/0
673 root 20 0 0 0 0 S 0.3 0.0 0:00.78 kworker/0:1
674 root 20 0 6524 2632 2200 R 0.3 0.3 0:00.90 top

 

Using cat /proc/interrupts i understand hw interrupts only go to core 0 and only from queue 2:
CPU0 CPU1
59: 0 0 GPCV2 118 Edge 30be0000.ethernet
60: 0 0 GPCV2 119 Edge 30be0000.ethernet
61: 32429 0 GPCV2 120 Edge 30be0000.ethernet
62: 31 0 GPCV2 42 Edge 30b20000.usb
63: 0 0 GPCV2 100 Edge 30bf0000.ethernet
64: 0 0 GPCV2 101 Edge 30bf0000.ethernet
65: 276862 0 GPCV2 102 Edge 30bf0000.ethernet
.
Using perf_4.15 report i found the must of the time is spent on "fec_enet_rx_napi"
Samples: 20K of event 'cpu-clock', Event count (approx.): 5173250000
Overhead Command Shared Object Symbol
34.42% swapper [kernel.kallsyms] [k] cpuidle_enter_state
14.64% ksoftirqd/0 [kernel.kallsyms] [k] fec_enet_rx_napi
4.41% ksoftirqd/0 [kernel.kallsyms] [k] fec_enet_start_xmit

 

Using perf_4.15 sched latency you see the must of the time is spent ksoftirq instead of kworker:

 

-------------------------------------------------------------------------------
Task | Runtime ms | Switches | Average delay ms | Maximum
-------------------------------------------------------------------------------
kworker/0:1:673 | 26.999 ms | 65 | avg: 0.518 ms | max:
kworker/u4:0:6 | 0.443 ms | 3 | avg: 0.181 ms | max:
kworker/0:0:1346 | 0.166 ms | 3 | avg: 0.145 ms | max:
jbd2/mmcblk2p2-:113 | 0.473 ms | 3 | avg: 0.034 ms | max:
kthreadd:(2) | 0.200 ms | 5 | avg: 0.024 ms | max:
rcu_preempt:7 | 0.175 ms | 6 | avg: 0.018 ms | max:
kworker/1:1:1367 | 1.532 ms | 63 | avg: 0.017 ms | max:
ksoftirqd/0:3 | 5749.556 ms | 349 | avg: 0.016 ms | max:
mmcqd/2:100 | 2.472 ms | 6 | avg: 0.016 ms | max:
perf_4.15:1371 | 2.953 ms | 1 | avg: 0.014 ms | max:
ksoftirqd/1:15 | 0.073 ms | 3 | avg: 0.012 ms | max:
kworker/1:1H:112 | 0.054 ms | 2 | avg: 0.010 ms | max:
kworker/1:3:693 | 0.207 ms | 4 | avg: 0.004 ms | max:
-------------------------------------------------------------------------------
TOTAL: | 5785.300 ms | 513 |
---------------------------------------------------

 

The packet drops are identified only in rx section and not in tx section.

I have set the cpu affinity of the RX and TX queues to 3 so both CPUs handle the softirqs but achieved only 7% of full 1GE rate.

any other sugesettions?

 

Danny

 

Outcomes