This is with two cores enabled but really only one core running as there is just one iperf process running, nothing else of note. The MTU is set to 1500 and iperf fills the packets (has a buffer size of 8k) so frame length about 1500 bytes and data length a little less. With an MTU of 9000 data rate versus CPU usage is obviously much better.
Your figures for the P1020 seem to match what we are seeing.
So the question remains, is this high CPU usage a Linux Ethernet driver/stack/interrupts efficiency issue (We could look at optimising the code) or is it just down to the P1022's raw performance. An Intel 2GHz Xeon uses about 6% CPU on one core doing the same (not like for like, I know). Would the P20xx series be significantly better than the P1022 at this ?
Any ideas/info ?