<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: IPU/PCIe throughput problem on i.MX6 in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448468#M69276</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Yuri,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I was reviewing that as well. &amp;nbsp;The memcpy() provided by our IDE is pretty well optimized but only used 4 registers for the loads and stores. &amp;nbsp;I had previously tried using 8 registers and got a small performance gain but not up to Linux speeds. &amp;nbsp;I just added a preload instruction to the loop and improved our PCIe read performance to 62 MB/s (was 32 MB/s). &amp;nbsp;Writes remain at 240 MB/s. &amp;nbsp;Much closer to Linux speeds but still room to improve. &amp;nbsp;I'll look more closely at the Linux implementation as there may be more I can do.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Many of my PCIe transfers are effectively 4KB page copies - is there any way to use the IPU to transfer this data (and would it get better performance)?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I appreciate your help,&lt;/P&gt;&lt;P&gt;-Carl&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 12 Oct 2016 15:03:34 GMT</pubDate>
    <dc:creator>carlpii</dc:creator>
    <dc:date>2016-10-12T15:03:34Z</dc:date>
    <item>
      <title>IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448457#M69265</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;Hello, &lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;I am currently working on a project that is trying to transfer raw video from the i.MX6 Solo ARM processor&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;over a PCIe link to an FPGA. The i.MX6 in this case serves as the root complex(RC), and the FPGA is the Endpoint.&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;We are using a pcie x1 gen1 link.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;The firmware running on the i.MX6 is based on this demo &lt;A _jive_internal="true" href="https://community.nxp.com/docs/DOC-95014"&gt;https://community.freescale.com/docs/DOC-95014&lt;/A&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;Basically we want to use IPU to transfer frames to memory that is connected to the FPGA and read it back.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;I have attached screenshots from the FPGA's on-chip logic analyzer showing that writing data works fine and with&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;reasonable link utilization. In the logic analyzer screenshots the x_st signals indicate the first 32-bit doubleword&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;in a packet, and x_end indicates the last 32-bit doubleword in the packet.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;However, we have a problem with reading data from the FPGA.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;In my opinion the problem is that the IPU doesn't read data the way it is supposed to do according to&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;the PCIe specification or the way it is usually used. The PCIe specification allows the root complex&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;to request a lot of data (for example 4KB), which the endpoint then completes with multiple 128B packets (TLPss).&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;When using the IPU with software inspired by your PCIe validation/throughput demo, we can see that the root complex&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;generates small requests - at most 64 bytes at a time. This is an issue because for some reason our current&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;PCIe setup then issues at most 4 memory read TLP's at a time.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;You can see from the screenshots that this leads to very poor throughput. &lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;I have also reproduced this problem in HDL simulation with a root complex simulation model,&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;so I believe this is a limitation of PCIe.&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;I am currently digging into the specification to verify this.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;My question for you is whether there is a way to get the i.MX6's PCIe and IPU modules to generate&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;PCIe read requests which are at least 1KB in size.&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;I believe this would greatly help with our bandwidth problem.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;Please feel free to ask me for any aditional details or clarifications about this problem,&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;so this problem can be resolved.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;Kind regards,&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;Stjepan Henc&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 09 Oct 2015 15:48:22 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448457#M69265</guid>
      <dc:creator>stjepanhenc</dc:creator>
      <dc:date>2015-10-09T15:48:22Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448458#M69266</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Verdana','sans-serif';"&gt;64-byte is maximal value for single burst length under i.MX6.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Verdana','sans-serif';"&gt;Please refer to the following thread, where similar considerations &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Verdana','sans-serif';"&gt;for burst length are provided.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.nxp.com/docs/DOC-106467"&gt;i.MX6 maximum EIM burst length and performance &lt;/A&gt;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Have a great day,&lt;BR /&gt;Yuri&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;-----------------------------------------------------------------------------------------------------------------------&lt;BR /&gt;Note: If this post answers your question, please click the Correct Answer button. Thank you!&lt;BR /&gt;-----------------------------------------------------------------------------------------------------------------------&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 14 Oct 2015 08:45:59 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448458#M69266</guid>
      <dc:creator>Yuri</dc:creator>
      <dc:date>2015-10-14T08:45:59Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448459#M69267</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for your answer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So you say that this is a limitation of the i.MX6 architecture and PCIe core, so that there is no way to issue a request for more than 64 bytes of data at a time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I believe the high overhead of PCIe read requests requires that a lot of data be requested with a single request (that's why the limit is 4KB) to achieve speeds comparable to the write channel. I see that that is not easily achievable if the burst size is so limited.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But this is just one part of the problem. We get the biggest performance hit because the i.MX6 root complex issues only 4 read requests at a time.&lt;BR /&gt;Can you confirm that this is not caused by your PCIe root complex?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The PCIe IP core from Lattice we are using in the FPGA is also suspicious, so I would like to eliminate one of the possibilities.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In the meantime we decided to exchange i.MX6 reading from the FPGA with FPGA writing into i.MX6 in hopes of achieving better performance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Stjepan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 15 Oct 2015 15:45:53 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448459#M69267</guid>
      <dc:creator>stjepanhenc</dc:creator>
      <dc:date>2015-10-15T15:45:53Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448460#M69268</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; According to section 48.4.1.3 (Features List) of the i.MX6 D/Q RM :&lt;/P&gt;&lt;P&gt;"Programmable and extended AXI burst lengths to support up to 4K read/write burst&lt;/P&gt;&lt;P&gt;lengths over AXI master and slave interfaces."&lt;BR /&gt;But, because of "independent maximum read request and transfer sizes between AXI &lt;BR /&gt;and PCI Express, transfers can be split into multiple transfers.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards,&lt;/P&gt;&lt;P&gt;Yuri.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 18 Feb 2016 09:10:44 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448460#M69268</guid>
      <dc:creator>Yuri</dc:creator>
      <dc:date>2016-02-18T09:10:44Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448461#M69269</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Well yes.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The way to get decent performance on PCI Express is to have the PCI Express endpoint/device act as a master on the bus,&lt;/P&gt;&lt;P&gt;and run with its own DMA, reading and writing from the processors memory. This way the endpoint has full control over packet size and number.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;The processor (in this case imx6) should only read and write control information to the device, for example configuring the DMA in the PCIe device to start a transfer.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 12 May 2016 07:29:37 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448461#M69269</guid>
      <dc:creator>stjepanhenc</dc:creator>
      <dc:date>2016-05-12T07:29:37Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448462#M69270</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello Stjepan and Yuri,&lt;/P&gt;&lt;P&gt;I'm currently working in a very similar scenario than the one Stjepan explained in his first post of this thread, that is, I'm trying to transmit data from the iMX6 to a Xilinx FPGA through a pcie link x1, using the IPU DMA to have reasonable performance, given the PCIe RC in iMX6 doesn't have DMA.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The writing operations from the iMX6 to the FPGA seem to work fine but instead of reaching the 344 MB/s stated in the link of the demo, with TLP data size 64 bytes, I only get about 54MB/s due to the fact that the data size of the TLPs I receive in the FPGA side is only 16 bytes.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Stjepan, you say that in your case, the writing data operations had a reasonable link utilization. Could you give more specific numbers? Did you experience the same performance degradation as me in comparison with the demo? If not, I think it would be very useful for me to check which are the differences between your setup and mine. Stjepan and Yuri, which pcie configuration parameters should I touch in the iMX6 side in order to increase the size of the TLPs transmitted?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks very much in advance for your answer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best Regards,&lt;/P&gt;&lt;P&gt;Eduardo.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Sep 2016 09:39:16 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448462#M69270</guid>
      <dc:creator>eduardodelcasti</dc:creator>
      <dc:date>2016-09-26T09:39:16Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448463#M69271</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;Please check if Your system settings follow recommendations of&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.nxp.com/docs/DOC-95014"&gt;i.MX6Q PCIe EP/RC Validation System&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In particular : &lt;SPAN style="color: #51626f; background-color: #ffffff;"&gt;Use mem=768M&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Yuri.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 Sep 2016 05:11:27 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448463#M69271</guid>
      <dc:creator>Yuri</dc:creator>
      <dc:date>2016-09-28T05:11:27Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448464#M69272</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Yuri,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't see that the mem=768M boot parameter is going to have any affect on performance - it simply tells the kernel not to use all of the DDR available on the board (leaves 256M available for the EP driver test).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;After experimenting with the EP/RC (non-IPU) driver code, I have been able to replicate the "cache disabled" and "cache enabled" write&amp;nbsp;and read performance numbers but I am concerned that the "cache enabled" approach may not be safe&amp;nbsp;in a real system. &amp;nbsp;The&amp;nbsp;"cache disabled" approach uses ioremap() to map the iATU region for PCIe writes and reads but the "cache enabled" approach uses&amp;nbsp;ioremap_cache().&amp;nbsp; Can you confirm that marking this region as cacheable&amp;nbsp;is actually safe to use? &amp;nbsp;It seems like it could create both ordering and coherency problems if PCIe writes and reads are going through the caches.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;-Carl&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 04 Oct 2016 23:44:25 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448464#M69272</guid>
      <dc:creator>carlpii</dc:creator>
      <dc:date>2016-10-04T23:44:25Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448465#M69273</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; Strictly speaking, You are right :&amp;nbsp;&lt;SPAN style="color: #51626f; background-color: #ffffff;"&gt; if PCIe writes and reads are going through the caches - this&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #51626f; background-color: #ffffff;"&gt;improves performance, but &amp;nbsp;&lt;SPAN&gt;coherency problems should be taken into account. It is needed to &lt;BR /&gt;flush&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN style="color: #51626f; background-color: #ffffff;"&gt;&lt;SPAN&gt;cache finally. &amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Yuri.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 05 Oct 2016 05:14:16 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448465#M69273</guid>
      <dc:creator>Yuri</dc:creator>
      <dc:date>2016-10-05T05:14:16Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448466#M69274</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Yuri,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am now attempting to replicate the "cached" performance from&amp;nbsp;a non-Linux OS (RTOS) running on an i.MX6Q endpoint. &amp;nbsp;The endpoint is attached to an SDB running Linux 3.14.28. &amp;nbsp;My Linux driver on the SDB can access endpoint memory with&amp;nbsp;expected performance but accesses initiated by the endpoint to the SDB are significantly slower. &amp;nbsp;With the iATU mapped as Write-back, no Write-alloc to match the Linux ioremap_cache() case, I can only get around 32 MB/s reads and 240 MB/s writes versus the roughly 100 MB/s read and 300 MB/s write performance seen from the Linux side. &amp;nbsp;With the iATU mapped as Device memory, I get around 17&amp;nbsp;MB/s reads and 42 MB/s writes versus 29 MB/s and 110 MB/s for Linux using ioremap().&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The endpoint initialization code is largely derived from the iMX6 Platform SDK. &amp;nbsp;I've attempted to compare settings for the L2 cache, SCU, and SCTLR register between Linux and our RTOS but can't find anything that seems to make a difference. &amp;nbsp;Can you provide any suggestions on how to find our missing performance?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;-Carl&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 11 Oct 2016 17:48:39 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448466#M69274</guid>
      <dc:creator>carlpii</dc:creator>
      <dc:date>2016-10-11T17:48:39Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448467#M69275</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; Do You use&amp;nbsp;&lt;SPAN class=""&gt;&amp;nbsp;load and store multiple register instructions to cached area&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;to achieve maximum throughput ?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Yuri.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 12 Oct 2016 05:00:03 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448467#M69275</guid>
      <dc:creator>Yuri</dc:creator>
      <dc:date>2016-10-12T05:00:03Z</dc:date>
    </item>
    <item>
      <title>Re: IPU/PCIe throughput problem on i.MX6</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448468#M69276</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Yuri,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I was reviewing that as well. &amp;nbsp;The memcpy() provided by our IDE is pretty well optimized but only used 4 registers for the loads and stores. &amp;nbsp;I had previously tried using 8 registers and got a small performance gain but not up to Linux speeds. &amp;nbsp;I just added a preload instruction to the loop and improved our PCIe read performance to 62 MB/s (was 32 MB/s). &amp;nbsp;Writes remain at 240 MB/s. &amp;nbsp;Much closer to Linux speeds but still room to improve. &amp;nbsp;I'll look more closely at the Linux implementation as there may be more I can do.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Many of my PCIe transfers are effectively 4KB page copies - is there any way to use the IPU to transfer this data (and would it get better performance)?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I appreciate your help,&lt;/P&gt;&lt;P&gt;-Carl&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 12 Oct 2016 15:03:34 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/IPU-PCIe-throughput-problem-on-i-MX6/m-p/448468#M69276</guid>
      <dc:creator>carlpii</dc:creator>
      <dc:date>2016-10-12T15:03:34Z</dc:date>
    </item>
  </channel>
</rss>

