Hi
We are testing 10G port speed on LS1046ARDB and the speed did not meet 10G.
Details of the measurement results:
Measurement environment:
Only TCP tx reached about 7Gbps, while rx and UDP were about 2Gbps.
Attached the logs.
解決済! 解決策の投稿を見る。
1. Yes, The number of CPU cores is bottleneck.
2. This is because in DPAA1, each interface used by default one pool channel across all software portals and also the dedicated channel of each CPU. In Linux Kernel, PCD Frame queues in use dedicated channels. You could refer to
the section "5. Dedicated and Pool Channels Usage in Linux Kernel" in Using QMAN Dedicated and Pool Channels in USDPAA and Linux Kernel .
3. It seems that there is problem with your iperf3 command itself.
In Ubuntu main rootfs, you could use "apt-get install iperf3" command to install iperf3 command online.
I was able to get 10G speed with 9000 MTU and 97% of 10G speed with 1500 MTU with the following setup:
I used two PCs with ixgbe cards to generate and receive traffic. I set up the LS1046 to route packets between the PCs. I tested with a direct connection between the PCs first and verified that I got 10G throughput.
On one PC I used the following command. You can find this in the Linux kernel source:
samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh -i eth0 -d <ip_address> -m <mac_address> -n 0 -t $(nproc) -s <frame_size>
The IP address and MAC address should be for the LS1046. On the other PC I ran the following command:
tcpstat -i eth0 -p -o 'Time:%S\tpps=%p\tavg=%a\tstddev=%d\tbps=%b\n' 5
This will print out ingress traffic statistics every 5 seconds. Note that "bps" only includes data bytes. That is, it does not include L1/L2 overhead.
Although I didn't get full 10G throughput at 1500 MTU, I suspect that it is possible. I didn't configure any kind of hardware offloading on the LS1046.
Hi @yipingwang, thanks for your quick comment!
I tested again along with your comment(fmc and iperf option), then TCP RX result is significant improved to 9.06Gbps from about 2.6Gbps, TCP TX keeps 7Gbps but UDP TX/RX is still lower than half ot 10Gbps.
Is this your expected result? or should be more high throughput result?
Measurement environment:
Please deploy ubuntu main rootfs to SD card, then boot LS1046ARDB Linux and rootfs from SD card and try whether the iperf performance would be improved.
On your host PC:
$ wget https://www.nxp.com/lgfiles/sdk/lsdk2108/flex-installer && chmod +x flex-installer && sudo
mv flex-installer /usr/bin
$ flex-installer -i pf -d /dev/sdx
$ flex-installer -i auto -m ls1046ardb -d /dev/sdx
Then plugin SD card to LS1046ARDB, go to u-boot prompt and run "boot" command.
=>boot
Hi @yipingwang, I tested along with your comment, but looks no improvement...
Could you please check bellow result?
Please use iperf command not iperf3, and use those commands which I provided in my first post to do the test.
Please apply fmc policy before running iperf.
Hi @yipingwang, I re-tested with "iperf", "-u" option in server side and fmc cmd, tcp rx/tx result achieves almost 10Gbps but udp rx/tx result looks a little strange.
Could you please check?
In parallel, we will check our network env again.
On LS1046ARDB:
$ iperf -s -u
On your PC
$ iperf -c <ip address> -P 10 -t 30 -u -b 10G
Then check the result on LS1046ARDB, please check the result in "[SUM]" item.
Hi @yipingwang, Thanks for your kindly advice!
Our using "iperf" doesn't support "-R"(Reverse) option, but we can get great results on both tcp tx/rx and udp tx/rx on ubuntu main and yocto tiny!
Could you please check following results? They are expected result or should be more high throughput?
The test result with Yocto tiny rootfs seems OK.
Hi @yipingwang, OK, thanks for your support.
So, we understand that our results improved by these three factors(paralell option in iperf/fmc cmd/using iperf v2)
Finally we have some questions, could you please answer them?
We would like to confirm how they affect the results just in case.
1. Yes, The number of CPU cores is bottleneck.
2. This is because in DPAA1, each interface used by default one pool channel across all software portals and also the dedicated channel of each CPU. In Linux Kernel, PCD Frame queues in use dedicated channels. You could refer to
the section "5. Dedicated and Pool Channels Usage in Linux Kernel" in Using QMAN Dedicated and Pool Channels in USDPAA and Linux Kernel .
3. It seems that there is problem with your iperf3 command itself.
In Ubuntu main rootfs, you could use "apt-get install iperf3" command to install iperf3 command online.
Hi @yipingwang, thanks for your answers!
I checked iperf3 then it's re-designed and supports only single thread, on the other hand iperf v2 series supports multi thread execution, so seems that measurement with iperf3 is limited by single core performance.
ref link : https://software.es.net/iperf/faq.html#:~:text=iperf3%20parallel%20stream%20performance%20is%20much%...
In fact, I tested roughly by creating four iperf3 processes, then I can get total 9Gbps for tcp tx/rx and 7Gbps for udp tx/rx, it's significantly improved from previous commented result.
Anyway I will use iperf v2 series for LS1046ARDB and our custom board with fmc command.
Thanks for your kindly support!
OK, thanks for your information.
1. Please boot up the target board with Ubuntu rootfs filesystem or apply fmc policy as the following.
fmc -c /etc/fmc/config/private/ls1046ardb/RR_FFSSPPPH_1133_5559/config.xml -p /etc/fmc/config/private/ls1046ardb/RR_FFSSPPPH_1133_5559/policy_ipv4.xml -a
2. Please run the following iperf command son the server and client sides.
Sever: iperf -s
Client: iperf -c 10.10.10.2 -P 10 -t 30
For UDP iperf test, you need to specify bandwidth with option "-b".
Sever: iperf -s -u
Client: iperf -c 100.1.1.99 -P 10 -t 30 -u -b 10G
If your problem persists, please provide the console log for the target board.