Using a TWR-LS1021A board with SDK v1.8 (Linux 3.12.370rt51)
I've configured eth0 following the recommendations from Test Procedure
root@ls1021atwr:~# ethtool -C eth0 rx-frames 22
root@ls1021atwr:~# ethtool -C eth0 tx-frames 22
root@ls1021atwr:~# ethtool -C eth0 rx-usecs 32
root@ls1021atwr:~# ethtool -C eth0 tx-usecs 32
root@ls1021atwr:~# ethtool -K eth0 gro on gso on sg on
root@ls1021atwr:~# netperf -H 10.0.0.20 -l 10 -T 1 -c 100 -C 100 -n2 -t TCP_SENDFILE -v 2 -- -C
TCP SENDFILE TEST from 0.0.0.0 () port 0 AF_INET to 10.0.0.20 () port 0 AF_INET : cpu bind
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 16384 16384 10.03 886.20 76.23 18.60 14.094 3.439
Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 1.111e+09 16384.00 67799 23129.53 48026
Maximum
Segment
Size (bytes)
1448
In this setup, the other side is a Core2Duo x86 laptop @2.9 Ghz.
Using strace, I can confirm that netperf is indeed using the sendfile64 syscall:
root@ls1021atwr:~# grep -c sendfile64 netperf.strace
36850
The achieved bandwidth is less than we hoped, and the CPU utilization is unacceptably high. This is already after:
Why is the CPU utilization still so extremely high, and what can be done to improve this further?