<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>LayerscapeのトピックRe: Network Gbit Performance on TWR-LS1021A - high CPU utilization</title>
    <link>https://community.nxp.com/t5/Layerscape/Network-Gbit-Performance-on-TWR-LS1021A-high-CPU-utilization/m-p/434582#M374</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello &lt;SPAN class="replyToName"&gt;Reinhard Tartler,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="replyToName"&gt;For ls1021atwr CPU Rev 1.0 board, when the frame size is 16384 bytes, the throughput data is about 850Mbps, this test result looks normal. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="replyToName"&gt;Would you please let me know your expected performance data?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="replyToName"&gt;If you are just in the evaluation stage, I suggest you choose &lt;SPAN class="replyToName"&gt;ls1021atwr CPU Rev 2.0 board and "Linux SDK for LS1021A v0.4", you will get much better Network performance and lower CPU utilization.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Have a great day,&lt;BR /&gt;Yiping&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;-----------------------------------------------------------------------------------------------------------------------&lt;BR /&gt;Note: If this post answers your question, please click the Correct Answer button. Thank you!&lt;BR /&gt;-----------------------------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 11 Sep 2015 08:23:20 GMT</pubDate>
    <dc:creator>yipingwang</dc:creator>
    <dc:date>2015-09-11T08:23:20Z</dc:date>
    <item>
      <title>Network Gbit Performance on TWR-LS1021A - high CPU utilization</title>
      <link>https://community.nxp.com/t5/Layerscape/Network-Gbit-Performance-on-TWR-LS1021A-high-CPU-utilization/m-p/434581#M373</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Using a TWR-LS1021A board with SDK v1.8 (Linux 3.12.370rt51)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've configured eth0 following the recommendations from &lt;A href="http://www.freescale.com/infocenter/topic/QORIQSDK/4085174.html" title="http://www.freescale.com/infocenter/topic/QORIQSDK/4085174.html"&gt;Test Procedure&lt;/A&gt; &lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;root@ls1021atwr:~# ethtool -C eth0 rx-frames 22&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;root@ls1021atwr:~# ethtool -C eth0 tx-frames 22&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;root@ls1021atwr:~# ethtool -C eth0 rx-usecs 32&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;root@ls1021atwr:~# ethtool -C eth0 tx-usecs 32&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;root@ls1021atwr:~# ethtool -K eth0 gro on gso on sg on&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;root@ls1021atwr:~# netperf -H 10.0.0.20 -l 10&amp;nbsp; &lt;STRONG&gt;-T 1&lt;/STRONG&gt; -c 100 -C 100 -n2 -t TCP_SENDFILE -v 2 -- -C&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;TCP SENDFILE TEST from 0.0.0.0 () port 0 AF_INET to 10.0.0.20 () port 0 AF_INET : cpu bind&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Recv&amp;nbsp;&amp;nbsp; Send&amp;nbsp;&amp;nbsp;&amp;nbsp; Send&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Utilization&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Service Demand&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Socket Socket&amp;nbsp; Message&amp;nbsp; Elapsed&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Send&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Recv&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Send&amp;nbsp;&amp;nbsp;&amp;nbsp; Recv&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Size&amp;nbsp;&amp;nbsp; Size&amp;nbsp;&amp;nbsp;&amp;nbsp; Size&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Throughput&amp;nbsp; local&amp;nbsp;&amp;nbsp;&amp;nbsp; remote&amp;nbsp;&amp;nbsp; local&amp;nbsp;&amp;nbsp; remote&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;bytes&amp;nbsp; bytes&amp;nbsp;&amp;nbsp; bytes&amp;nbsp;&amp;nbsp;&amp;nbsp; secs.&amp;nbsp;&amp;nbsp;&amp;nbsp; 10^6bits/s&amp;nbsp; % S&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; % S&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; us/KB&amp;nbsp;&amp;nbsp; us/KB&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt; 87380&amp;nbsp; 16384&amp;nbsp; 16384&amp;nbsp;&amp;nbsp;&amp;nbsp; 10.03&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;886.20&lt;/STRONG&gt;&amp;nbsp;&amp;nbsp; &lt;STRONG&gt;76.23&lt;/STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 18.60&amp;nbsp;&amp;nbsp;&amp;nbsp; 14.094&amp;nbsp; 3.439&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Alignment&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Offset&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Bytes&amp;nbsp;&amp;nbsp;&amp;nbsp; Bytes&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Sends&amp;nbsp;&amp;nbsp; Bytes&amp;nbsp;&amp;nbsp;&amp;nbsp; Recvs&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Local&amp;nbsp; Remote&amp;nbsp; Local&amp;nbsp; Remote&amp;nbsp; Xfered&amp;nbsp;&amp;nbsp; Per&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Per&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Send&amp;nbsp;&amp;nbsp; Recv&amp;nbsp;&amp;nbsp;&amp;nbsp; Send&amp;nbsp;&amp;nbsp; Recv&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Send (avg)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Recv (avg)&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0 1.111e+09&amp;nbsp; 16384.00&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 67799&amp;nbsp;&amp;nbsp; 23129.53&amp;nbsp; 48026&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Maximum&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Segment&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Size (bytes)&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; 1448&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P&gt;In this setup, the other side is a Core2Duo x86 laptop @2.9 Ghz.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Using strace, I can confirm that netperf is indeed using the sendfile64 syscall:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;root@ls1021atwr:~# grep -c sendfile64 netperf.strace&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;36850&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P&gt;The achieved bandwidth is less than we hoped, and the CPU utilization is unacceptably high. This is already after:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Ensuring that all IRQs go to CPU0 (that seems to be the default, distributing eth0 IRQ as described in &lt;A href="http://www.freescale.com/infocenter/topic/QORIQSDK/4085174.html" title="http://www.freescale.com/infocenter/topic/QORIQSDK/4085174.html"&gt;http://www.freescale.com/infocenter/topic/QORIQSDK/4085174.html&lt;/A&gt;​ seems to worsen performance)&lt;/LI&gt;&lt;LI&gt;Pin netperf to CPU1&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Why is the CPU utilization still so extremely high, and what can be done to improve this further?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 08 Sep 2015 22:42:50 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Layerscape/Network-Gbit-Performance-on-TWR-LS1021A-high-CPU-utilization/m-p/434581#M373</guid>
      <dc:creator>reinhardtartler</dc:creator>
      <dc:date>2015-09-08T22:42:50Z</dc:date>
    </item>
    <item>
      <title>Re: Network Gbit Performance on TWR-LS1021A - high CPU utilization</title>
      <link>https://community.nxp.com/t5/Layerscape/Network-Gbit-Performance-on-TWR-LS1021A-high-CPU-utilization/m-p/434582#M374</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello &lt;SPAN class="replyToName"&gt;Reinhard Tartler,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="replyToName"&gt;For ls1021atwr CPU Rev 1.0 board, when the frame size is 16384 bytes, the throughput data is about 850Mbps, this test result looks normal. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="replyToName"&gt;Would you please let me know your expected performance data?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="replyToName"&gt;If you are just in the evaluation stage, I suggest you choose &lt;SPAN class="replyToName"&gt;ls1021atwr CPU Rev 2.0 board and "Linux SDK for LS1021A v0.4", you will get much better Network performance and lower CPU utilization.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Have a great day,&lt;BR /&gt;Yiping&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;-----------------------------------------------------------------------------------------------------------------------&lt;BR /&gt;Note: If this post answers your question, please click the Correct Answer button. Thank you!&lt;BR /&gt;-----------------------------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 11 Sep 2015 08:23:20 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Layerscape/Network-Gbit-Performance-on-TWR-LS1021A-high-CPU-utilization/m-p/434582#M374</guid>
      <dc:creator>yipingwang</dc:creator>
      <dc:date>2015-09-11T08:23:20Z</dc:date>
    </item>
  </channel>
</rss>

