<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface. in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387202#M56157</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Chip&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your answer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're indeed currently testing with different RALAT/WALAT settings and do see performance differences with different settings. Instead of the default RALAT/WALAT-settings of 5/1, which Freescale is using in U-Boot, we can actually use 2/0 for DDR3-800 on our boards and most of or DDR3-1066 boards (although unfortunately not all) work fine at 3/0. We will be doing some layout-optimizations for our next board-revision, this might probably allow all of our DDR3-1066 boards to work with 3/0. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I still can't see, that 64-bit DDR3 interface makes any performance difference, though. On the contrary:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;If I understand the i.MX6 documentation correctly, both the Cortex-A9-MPCore complex and the L2CC are connected via two 64-Bit AXI-busses each running at half the CPU-Core clock to the rest of the system. Reference Manual section "12.5.5 L2 Cache and controller (PL310)" mentions that the "L2 cache also utilizes 2x AXI-64 to access the L3 memory or other SoC peripherals in a symmetric way". Although this leaves some room for speculation as no proper block diagram seems to be provided, but I guess this is supposed to mean, that the MMDC can (and will) be accessed via both ports?!. This should give the A9-MPCore complex a memory bandwidth of at least ~6.4 GB/s (with the cores running at 800 MHz), and with all 4 cores of an i.MX6Q running in parallel at the same time, there should be more than enough active bus-masters for the MMDC to chose from, but we still only see ~2.0 GB/s maximum aggregated RAM-bandwidth: with all 4 cores running simultaneously, each one only sees about 500 MB/s RAM-bandwidth even for a DDR3-1066 64-bit interface.&lt;/LI&gt;&lt;LI&gt;If the memory bandwidth usable by the Cortex-A9-MPCore complex is limited by some arbitration and latency issues in the NIC301 and the MMDC, but the MMDC can actually provide more bandwidth with DDR3-1066 64-Bit RAM if multiple masters are active, I'd expect the RALAT/WALAT settings to make hardly any measurable difference (as seen from a CPU-core)?!&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;But I will try to see which impact activating more additional bus-masters has on the measured performance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Marc&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 01 Sep 2014 06:43:42 GMT</pubDate>
    <dc:creator>MOW</dc:creator>
    <dc:date>2014-09-01T06:43:42Z</dc:date>
    <item>
      <title>i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387200#M56155</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi all&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're currently porting Linux, Android, and Windows-CE BSPs to our own i.MX6-based board series, which can be equipped with either i.MX6Q, i.MX6D, i.MX6DL or i.MX6S SoCs, and either 32-bit or 64-bit DDR3 RAM in DDR3-800 and DDR3-1066 configurations connected to a single chip-select. While all combinations realized so far (i.MX6Q and i.MX6D with 64-bit DDR3-1066, and i.MX6S with 32-bit DDR3-800) work fine, all memory configurations are working stable, entire memory can be accessed properly, Freescale's DDR3 Stress Test tool runs fine just as well as more elaborate custom RAM tests and all three OSses, we can't seem to find any performance difference between 64-bit and 32-bit DDR3 configurations.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're using hand-tuned assembler-code memset() and memcpy() functions for benchmarking, using only ARMv7 integer code (no NEON), which match up quite nicely with theoretical bandwidth values on Freescale's i.MX53 QSB board and our own i.MX53-board designs with 32-bit DDR3-800 RAM, which we used for comparison to our i.MX6-design: While DDR3-800 provides a (very) theoretical bandwidth of 3.2 GB/s, the i.MX53-SoC's internal 200 MHz, 64-bit (single-datarate) AXI-bus connecting the Cortex-A8 core to the RAM-controller apparently limits the bandwidth already to 1.6 GB/s and our benchmark actually measures&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;0.9 GB/s when only a single DDR3 memory bank is involved&lt;/LI&gt;&lt;LI&gt;1.2 GB/s when two different DDR3 memory banks on the same chip-select are involved&lt;/LI&gt;&lt;LI&gt;1.4 GB/s when two DDR3 memory banks on two different chip-selects are used (tested on Freescale's i.MX53 QSB board)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Running the same benchmark on our i.MX6S-system with 32-bit DDR3-800 we measure for all tests ~10-20% higher performance, which probably can be explained by the more efficient out-of-order Cortex-A9 core compared to the in-order Cortex-A8 in i.MX53. (Maybe also again limited by the SoC-internal AXI-bus on i.MX6S?! Can't find any documentation on the internal bus-speeds in i.MX6-series...)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The same benchmark running on a single core on our i.MX6D/Q-systems with 64-bit DDR3-1066 RAM shows another ~10-20% performance increase, which seems to be caused by the higher DDR3 clock (533 MHz instead of 400 MHz). Overclocking the i.MX6S-system to 32-bit DDR3-1066 or underclocking the i.MX6D/Q-system to 64-bit DDR3-800 results in pretty much identical performance values. 32-bit vs. 64-bit doesn't show a difference at all !!!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Even if we run the same benchmarks on multiple CPU-cores in parallel on the i.MX6D/Q systems the aggregated performance of all two/four cores together only adds up to ~1.8-2.0GB/s bandwidth with DDR3-1066.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;During all benchmarks&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;IPU, VPU, GPUs and any other bus-masters in the system were turned-off,&lt;/LI&gt;&lt;LI&gt;all L1-caches and the L2-cache were running,&lt;/LI&gt;&lt;LI&gt;the SCU of the Cortex-A9 MPCore complex was enabled,&lt;/LI&gt;&lt;LI&gt;all performance optimizing features of the Cortex-A9/L2C310-combination (instruction and data prefetching, early BRESP, full line of zero, etc.) were turned-on on both sides,&lt;/LI&gt;&lt;LI&gt;each A9-core ran its instance of the benchmarks on its own separate DDR3 memory bank untouched by any of the other cores (bank-interleaving turned-off), so there shouldn't be any thrashing w.r.t. to the page open/close-policy of the RAM-controller with multiple cores running at the same time. Enabling bank-interleaving showed even slightly lower performance,&lt;/LI&gt;&lt;LI&gt;the CPU-cores were running at 792 MHz.&lt;/LI&gt;&lt;LI&gt;our memcpy()-function is based on ARM's sample code "5. Load-Multiple memory copy with preload" shown here: &lt;A href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html" title="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html"&gt;ARM Information Center&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;Implementing the "6. NEON memory copy with preload"-sample from &lt;A href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html" title="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html"&gt;ARM Information Center&lt;/A&gt; instead, which according to ARM seems to be the fastest way to copy (at least on a Cortex-A8), didn't show any performance difference at all, i.e. the copy-function itself doesn't seem to be the limiting factor here.&lt;/LI&gt;&lt;LI&gt;Doesn't seem to be an issue with thrashing in the L2-cache either:&lt;UL&gt;&lt;LI&gt;4-cores memcpy()-ing in parallel in entirely different areas of the RAM shouldn't cause any thrashing issues in a 16-way L2-cache in the first place and&lt;/LI&gt;&lt;LI&gt;testing with an "MP4 system lockdown"-configuration of the L2CC, as described in L2C310_r3p2-TRM section "2.3.6 Cache Lockdown (Table 2-15)", to give each A9-core its own private 256KB L2-cache shows even ~10% lower performance.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While we're not expecting to measure the full (theoretical) 64-bit DDR3-1066 bandwidth of 8.5GB/s, looking at the actually measured performance on 32-bit DDR3-800 we would be expecting to reach at least ~4.0GB/s aggregated bandwidth over 4 cores running in parallel on the 64-bit interface.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Except for the performance our 64-bit DDR3 configuration is running fine: the entire RAM can be accessed, executes any RAM-tests without any problems and all our operating systems (Linux, Android, Windows-CE) can fully use the entire RAM without any issues.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Is there anything else in MMDC RAM-controller, besides the DSIZ-setting in the MDCTL-register of MMDC0, that must configured to see any proper performance benefit of 64-bit RAM vs. 32-bit RAM?&lt;/LI&gt;&lt;LI&gt;Can anybody measure any difference between 64-bit and 32-bit DDR3 RAM interfaces on any i.MX6-board?&lt;/LI&gt;&lt;LI&gt;Has anybody managed to measure more than 2 GB/s RAM performance?&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kind Regards,&lt;/P&gt;&lt;P&gt;Marc&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 28 Aug 2014 09:37:45 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387200#M56155</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-08-28T09:37:45Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387201#M56156</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Marc&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;some public performance data can be found on link below&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.nxp.com/docs/DOC-94571"&gt;LMbench Benchmarks on i.MX&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;also you can request some data creating ticket.&lt;/P&gt;&lt;P&gt;Your data is close to ones obtained internally. Seems limitation&lt;/P&gt;&lt;P&gt;is caused by RALAT, MMDC arbitration (register MAARCR) and&lt;/P&gt;&lt;P&gt;internal NIC301arbitration latencies.&lt;/P&gt;&lt;P&gt;RALAT parameter, which is board dependant ,&amp;nbsp; can be decreased&lt;/P&gt;&lt;P&gt;for good boards.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Though these performance values 32-bit vs. 64-bit doesn't show a difference,&lt;/P&gt;&lt;P&gt;this interpretation should not be so straightforward that there is no sense to&lt;/P&gt;&lt;P&gt;use wider memory bus. In real OS system, where many modules (GPU,VPU,USB,SDMA)&lt;/P&gt;&lt;P&gt;act as bus masters, 32-bit vs. 64-bit does show a difference in overall performance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;chip&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 28 Aug 2014 14:44:16 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387201#M56156</guid>
      <dc:creator>igorpadykov</dc:creator>
      <dc:date>2014-08-28T14:44:16Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387202#M56157</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Chip&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your answer.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We're indeed currently testing with different RALAT/WALAT settings and do see performance differences with different settings. Instead of the default RALAT/WALAT-settings of 5/1, which Freescale is using in U-Boot, we can actually use 2/0 for DDR3-800 on our boards and most of or DDR3-1066 boards (although unfortunately not all) work fine at 3/0. We will be doing some layout-optimizations for our next board-revision, this might probably allow all of our DDR3-1066 boards to work with 3/0. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I still can't see, that 64-bit DDR3 interface makes any performance difference, though. On the contrary:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;If I understand the i.MX6 documentation correctly, both the Cortex-A9-MPCore complex and the L2CC are connected via two 64-Bit AXI-busses each running at half the CPU-Core clock to the rest of the system. Reference Manual section "12.5.5 L2 Cache and controller (PL310)" mentions that the "L2 cache also utilizes 2x AXI-64 to access the L3 memory or other SoC peripherals in a symmetric way". Although this leaves some room for speculation as no proper block diagram seems to be provided, but I guess this is supposed to mean, that the MMDC can (and will) be accessed via both ports?!. This should give the A9-MPCore complex a memory bandwidth of at least ~6.4 GB/s (with the cores running at 800 MHz), and with all 4 cores of an i.MX6Q running in parallel at the same time, there should be more than enough active bus-masters for the MMDC to chose from, but we still only see ~2.0 GB/s maximum aggregated RAM-bandwidth: with all 4 cores running simultaneously, each one only sees about 500 MB/s RAM-bandwidth even for a DDR3-1066 64-bit interface.&lt;/LI&gt;&lt;LI&gt;If the memory bandwidth usable by the Cortex-A9-MPCore complex is limited by some arbitration and latency issues in the NIC301 and the MMDC, but the MMDC can actually provide more bandwidth with DDR3-1066 64-Bit RAM if multiple masters are active, I'd expect the RALAT/WALAT settings to make hardly any measurable difference (as seen from a CPU-core)?!&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;But I will try to see which impact activating more additional bus-masters has on the measured performance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Marc&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 01 Sep 2014 06:43:42 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387202#M56157</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-09-01T06:43:42Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387203#M56158</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;In my RAM-benchmarking environment it seems not to be easily possible to cause enough traffic with other bus-master to see a performance impact on the RAM-performance measured by the Cortex-A9-MPCore complex, which suggests that there is indeed more bandwidth available, than the ARM-cores can see/use, but so far I couldn't impact RAM-performance on a 32bit DDR3 interface with further bus-masters, either (tried with the IPU configured for 1920x1080x32bpp @60Hz).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've done some experiments with the MMDC profiling registers:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Only MADPSRx registers in MMDC0 seem to be running in both 32-bit and 64-bit modes. I assume, this is correct, as MMDC1 is linked with MMDC0 in 64-bit configuration?!&lt;/LI&gt;&lt;LI&gt;All RAM-traffic generated by the Cortex-A9-MPCore complex during my benchmarks seems to be distributes evenly between both ARM_S0 and ARM_S1 AXI IDs. Therefore I suppose the A9-MPCore complex, the SCU and the L2CC are all using both of their AXI channels.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What was the intended purpose of the following (commented-out) piece of code found in Freescale's U-Boot for the i.MX6?&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;int dram_init(void)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;{&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; &lt;SPAN style="color: #ff0000;"&gt;&amp;nbsp; /*&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier; color: #ff0000;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Switch PL301_FAST2 to DDR Dual-channel mapping&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier; color: #ff0000;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * however this block the boot up, temperory redraw&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier; color: #ff0000;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; */&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier; color: #ff0000;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; /*&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier; color: #ff0000;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * u32 reg = 1;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier; color: #ff0000;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * writel(reg, GPV0_BASE_ADDR);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #ff0000; font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; */&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gd-&amp;gt;bd-&amp;gt;bi_dram[0].start = PHYS_SDRAM_1;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gd-&amp;gt;bd-&amp;gt;bi_dram[0].size = PHYS_SDRAM_1_SIZE;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return 0;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;}&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The comment sounds like it might have something to do with the issue but enabling this piece of code does indeed seem to lock up the system. I can't find any documentation on this in the reference manual (only "Registers at offset 0x0-0xffc are reserved for internal use" in the NIC-301 chapter) and otherwise neither U-Boot nor the Linux kernel seem to touch the NIC-301 configuration (although the "NOTE"-section in the NIC-301 Overview claims that "Freescale's board support package" configures the NIC-301).&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 01 Sep 2014 10:41:38 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387203#M56158</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-09-01T10:41:38Z</dc:date>
    </item>
    <item>
      <title>Re: Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387204#M56159</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Marc&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;this code switches PL301 FAST2 to specific to LPDDR2 Dual-channel mapping&lt;/P&gt;&lt;P&gt;attached some info. GPV is described in ARM documentation below&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;AMBA® Network Interconnect (NIC-301) Technical Reference Manual&lt;/P&gt;&lt;P&gt;&lt;A href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0397f/index.html" title="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0397f/index.html"&gt;ARM Information Center&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and sect.Table 45-1 "GPV ports memory allocations"&lt;/P&gt;&lt;P&gt;&lt;A href="http://cache.freescale.com/files/32bit/doc/ref_manual/IMX6DQRM.pdf?fasp=1&amp;amp;WT_TYPE=Reference%20Manuals&amp;amp;WT_VENDOR=FREESCALE&amp;amp;WT_FILE_FORMAT=pdf&amp;amp;WT_ASSET=Documentation&amp;amp;fileExt=.pdf"&gt;IMX6DQRM&lt;/A&gt; i.MX 6Dual/6Quad Applications Processor Reference Manual&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;chip&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 01 Sep 2014 15:05:39 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387204#M56159</guid>
      <dc:creator>igorpadykov</dc:creator>
      <dc:date>2014-09-01T15:05:39Z</dc:date>
    </item>
    <item>
      <title>Re: Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387205#M56160</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Chip&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I had already found in ARM's documentation that the code apparently accesses the remap register in the MX6FAST2 NIC-301, but I couldn't (and still can't) find any documentation on how the RAM-mapping actually looks like, when this code is executed compared to the default mapping.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Would it make any difference for DDR3-RAM (at least in 64-bit mode)? Is it even supposed to work with DDR3 or does the system lock-up, when executing this piece of code, because it is equipped with DDR3 instead of two-channel LPDDR2?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Marc&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Sep 2014 06:30:35 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387205#M56160</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-09-02T06:30:35Z</dc:date>
    </item>
    <item>
      <title>Re: Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387206#M56161</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Marc&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"two-channel" concept is used only for&amp;nbsp; LPDDR2,&lt;/P&gt;&lt;P&gt;from p.3820 &lt;A href="http://cache.freescale.com/files/32bit/doc/ref_manual/IMX6DQRM.pdf?fasp=1&amp;amp;WT_TYPE=Reference%20Manuals&amp;amp;WT_VENDOR=FREESCALE&amp;amp;WT_FILE_FORMAT=pdf&amp;amp;WT_ASSET=Documentation&amp;amp;fileExt=.pdf"&gt;IMX6DQRM&lt;/A&gt; :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"The core is composed of two channels, but both channels are&lt;/P&gt;&lt;P&gt;only active in LPDDR2 mode. If DDR3 mode is selected,&lt;/P&gt;&lt;P&gt;channel1 is not activated and the MMDC communicates with&lt;/P&gt;&lt;P&gt;the system through AXI port0.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can also look at "DRAM Controller Optimization for i.MX"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A class="loading" href="http://cache.freescale.com/files/training/doc/ftf/2014/FTF-SDS-F0170.pdf" title="http://cache.freescale.com/files/training/doc/ftf/2014/FTF-SDS-F0170.pdf"&gt;http://cache.freescale.com/files/training/doc/ftf/2014/FTF-SDS-F0170.pdf&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;chip&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Sep 2014 07:17:11 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387206#M56161</guid>
      <dc:creator>igorpadykov</dc:creator>
      <dc:date>2014-09-02T07:17:11Z</dc:date>
    </item>
    <item>
      <title>Re: Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387207#M56162</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Chip&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I see. So this setting is useless for DDR3 and this is probably the reason for the lock-up.&lt;/P&gt;&lt;P&gt;Back to my original question: 32-biut vs. 64-bit DDR3 performance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My tests yesterday with the IPU as additional bus-master seem not to have worked properly: the IPU wasn't really running. But now I have a setup, where the IPU is indeed running on a 1920x1080x32bpp framebuffer requiring a bandwidth of ~0.5 GB/s for screen refresh. Running our benchmarks again on this setup with single- and quad-cores on DDR3-800x32bit and DDR3-1066x64 still shows similar results:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;When the CPU cores are accessing the same RAM-bank as the IPU, CPU RAM-performance drops pretty much exactly by the same bandwidth required for display refresh.&lt;/LI&gt;&lt;LI&gt;When the CPU cores are accessing different RAM-banks than the IPU, the bandwidth required for the display refresh is mostly hidden.&lt;/LI&gt;&lt;LI&gt;Driving the IPU-bandwidth requirement further up to 1.0 GB/s impacts CPU RAM-performance even when other RAM-banks are used for the benchmarks&lt;/LI&gt;&lt;LI&gt;Aggregated RAM-bandwidth over all active bus-masters (incl. the IPU) in all cases is still 2.0 - 2.5 GB/s maximum.&lt;/LI&gt;&lt;LI&gt;Still no measurable difference between 32-bit and 64-bit DDR3 interface.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Seems in addition to the RAM-bandwidth actually usable by the CPU-cores there is hardly any additional "unused" RAM-bandwidth available to other bus-masters even with a 64-bit DDR3 interface.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Marc&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Sep 2014 08:46:50 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387207#M56162</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-09-02T08:46:50Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387208#M56163</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Probing further with my benchmarks using different setups I must correct my last posting somewhat: I now can actually measure a difference between 32-bit and 64-bit DDR3 interfaces when the IPU is used as additional bus-master but the difference is pretty low (sum of measured bandwidths used by IPU and 4 active ARM-cores running at 792 MHz):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;DDR3-800 32-bit reaches a maximum of ~2.3 GB/s in total&lt;/LI&gt;&lt;LI&gt;DDR3-800 64-bit reaches a maximum of ~2.5 GB/s in total&lt;/LI&gt;&lt;LI&gt;DDR3-1066 32-bit reaches a maximum of ~2.5 GB/s in total&lt;/LI&gt;&lt;LI&gt;DDR3-1066 64-bit reaches a maximum of ~2.8 GB/s in total&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;For the 32-bit interfaces actual performance is pretty good compared to the purely theoretical bandwidths of 3.2 or 4.2 GB/s. While it is reassuring to finally see a performance difference between the two interface widths, which implies that the systems are configured more or less properly, the performance benefit of only ~10% is quite disappointing.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(BTW: Running the CPU-cores at 1.0 or 1.2 GHz does improve RAM bandwidth measured by a single CPU-core, but total measured RAM-bandwidths with 4 cores and 1 IPU running at the same time, as given above, stay pretty much the same.)&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 02 Sep 2014 09:50:14 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387208#M56163</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-09-02T09:50:14Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387209#M56164</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Marc&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I found some modeling data showing performance for 64bit memcpy&lt;/P&gt;&lt;P&gt;(4MB block) as MX6D :1055.2MB/per core, MX6Q : 670.9MB/per core.&lt;/P&gt;&lt;P&gt;Though I do not have explanation for this.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;chip&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 03 Sep 2014 14:19:28 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387209#M56164</guid>
      <dc:creator>igorpadykov</dc:creator>
      <dc:date>2014-09-03T14:19:28Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387210#M56165</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Chip&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This sounds quite similar to what I do and see: we're also performing memcpy() of 4MB blocks and with 2 or 4 cores running the benchmark in parallel, we see quite similar performance values.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Without additional bus-masters running we get the same results on 32-bit RAM as well, though, and with additional bus-masters running, the performance of each core drops correspondingly to "make room" for the additional bandwidth requirements. The total available bandwidth to all active bus-masters (CPU-cores, IPU, GPU, VPU, etc.) seems to be still limited to ~2.8 GB/s in total on DDR3-1066 with 64-bit interface.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is only ~10% faster than what we can see with a 32-bit interface and therefore in comparison rather disappointing.I had hopes, we might have just missed some additional configuration settings necessary to enable proper speed-up by the doubled RAM interface-width?!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Marc&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 03 Sep 2014 14:30:42 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387210#M56165</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-09-03T14:30:42Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387211#M56166</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Marc&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;modeling data showing performance for 32bit memcpy&lt;/P&gt;&lt;P&gt;(4MB block) as 494.4MB/per core MX6Q ( 670.9MB @ 64bit)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;chip&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 04 Sep 2014 02:04:47 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387211#M56166</guid>
      <dc:creator>igorpadykov</dc:creator>
      <dc:date>2014-09-04T02:04:47Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387212#M56167</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Chip&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Seems I'm getting slightly faster results, then, that match your modeling data, if I compare my fastest 64-bit measurement against my slowest 32-bit measurement... ;-)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;With DD3-1066F-timings (CL7), RALAT/WALAT of 3/0, all 4 CPU-cores running with 1.2 GHz at the same time, each core using its private set of 2 RAM-banks (i.e. each RAM-bank is accessed only by a single core), no other active bus-master and using 4 MB memcpy I get (for each core):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;64-Bit RAM-Interface, identical source and destination RAM-banks for the memcpy: 544 MB/s&lt;/LI&gt;&lt;LI&gt;64-Bit RAM-Interface, different source and destination RAM-banks for the memcpy: 676 MB/s&lt;/LI&gt;&lt;LI&gt;32-Bit RAM-Interface, identical source and destination RAM-banks for the memcpy: 506 MB/s&lt;/LI&gt;&lt;LI&gt;32-Bit RAM-Interface, different source and destination RAM-banks for the memcpy: 613 MB/s&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;64-bit measurements just 10% faster than 32-bit at best...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Otherwise identical setup but additionally enabling an IPU consuming ~1.0 GB/s bandwidth (where two performance numbers are given, the first one is a measurement where at least one CPU-core accesses the same RAM-bank as the IPU and the second value is for all active bus-masters accessing different RAM-banks; as we have only one chip-select and one-rank of DDR3, we only have 8 banks available in total, so measurements with different source/destination banks for memcpy always cause one CPU-core to "collide" with the IPU on one bank).&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;64-Bit RAM-Interface, identical source and destination RAM-banks for the memcpy: 401/459 MB/s&lt;/LI&gt;&lt;LI&gt;64-Bit RAM-Interface, different source and destination RAM-banks for the memcpy: 496 MB/s&lt;/LI&gt;&lt;LI&gt;32-Bit RAM-Interface, identical source and destination RAM-banks for the memcpy: 374/408 MB/s&lt;/LI&gt;&lt;LI&gt;32-Bit RAM-Interface, different source and destination RAM-banks for the memcpy: 443 MB/s&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Again 64-bit measurements just 12% faster than 32-bit at best...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that I didn't confirm if the IPU is indeed able to achieve its 1.0 GB/s configuration; I'm just optimistically assuming this here without checking any "underflow" status bits. Adding all the numbers under this (maybe too optimistic) assumption leads to a total overall RAM-bandwidth of 2.7 GB/s for 32-bit and 2.9 GB/s for 64-bit.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;These numbers are ~0.1-0.2 GB/s higher than posted before, because for this benchmark now I am using the absolute fastest configuration, my current DUT is able to achieve. Final configuration (used during my previous benchmarks) will be using RALAT/WALAT of 4/0, CPU-core clock limited to 1.0 GHz, and DDR3-1066G timings (CL8 instead of CL7).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Very good performance for the 32-bit interface, rather poor performance for the 64-bit interface. Looking at these numbers and at the sensitivity of the benchmark measurements to even small timing configuration changes in the MMDC, it seems the performance limit here is indeed the MMDC itself.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But as my numbers seem to match up more or less with your modeling data, I suppose this is about all that the 64-bit DDR3 interface can achieve?!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Marc&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 04 Sep 2014 07:25:32 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387212#M56167</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-09-04T07:25:32Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387213#M56168</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Marc&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;probably yes, on one customer board was observed&lt;/P&gt;&lt;P&gt;98% bus loading with the mmdc_prof tool&lt;/P&gt;&lt;P&gt;( LPDDR2 400MHz 32 bits test data, about 1500MB/s)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;chip&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 04 Sep 2014 08:31:51 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387213#M56168</guid>
      <dc:creator>igorpadykov</dc:creator>
      <dc:date>2014-09-04T08:31:51Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387214#M56169</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Chip&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;While we hadn't expected to see twice the performance with 64-bit DDR3 RAM, we had expected more than just ~10-12% performance improvement. Alas, if that is all the MMDC can do, that's a hard limit, then.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But as you mention a customer with 32-bit LPDDR2: is the performance difference between 2-channel LPDDR2 configuration (each channel 32-bit wide) compared to single-channel LPDDR2 similar to the performance difference between 64-bit and 32-bit DDR3 interfaces? I'm wondering because of the special NIC-301 mapping for 2-channel LPDDR2 and the second MMDC AXI-bus, which is unused in all DDR3 configurations?!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Marc&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 04 Sep 2014 10:07:23 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387214#M56169</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-09-04T10:07:23Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387215#M56170</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Marc&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;MX6Q LPPD2 2 channel (2x32 interleaved) performance &lt;/P&gt;&lt;P&gt;is close (though lower) to 64bit DDR3 at the same frequency.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Best regards&lt;/P&gt;&lt;P&gt;chip&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 04 Sep 2014 11:22:06 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387215#M56170</guid>
      <dc:creator>igorpadykov</dc:creator>
      <dc:date>2014-09-04T11:22:06Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387216#M56171</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ok, I thought maybe due to the second MMDC-channel and the special NIC-301 mapping it might be faster than 64-bit DDR3.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your help and advice!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kind regards,&lt;/P&gt;&lt;P&gt;Marc&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 04 Sep 2014 11:42:43 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387216#M56171</guid>
      <dc:creator>MOW</dc:creator>
      <dc:date>2014-09-04T11:42:43Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387217#M56172</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Marc,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm going to jump in here with some comments, and this seemed like the best place to add them, since they are going to be along the lines of WALAT/RALAT.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I understand how you came up with a very theoretical bandwidth value of 3.2GB/s. And yes, if you specified upfront to start at a particular physcial DDR memory and conducted Reads and Write only to the same Bank/Row address (different column addresses allowed), you may be able to achieve something close to your theoretical value, assuming that you are also not conducting refreshes (as required by JEDEC).&lt;/P&gt;&lt;P&gt;I'm sure you are going to tell me that the above is obvious and you never were expecting to get to 3.2 GB/s, but I state the above only as a means for pointing out where you are going to get performance improvements.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The first step is in minimizing the length of the data traces. It doesn't matter so much for Writes, but for Reads, you have to account for extra time to complete the round trip. That is essentially what RALAT is doing for you. It gives you extra time to complete the data return trip from the time the controller releases the byte lane for a read to the time that the DDR has completed sending the data and it has final reached the processor pins and has been "clocked" in. Setting RALAT = 5 means you are adding five extra clocks to each 8-burst read cycle. So, bringing it down to 3 clock cycles means that you no longer waste the additonal 2 clocks. But that only works to a limit. You can't set RALAT = 2 because of the physical limitation of the layout: You simply have not told the controller to wait long enough to complete a read cycle. In other words, to get down to a point where RALAT = 2 will work for you, you are going to have to modify the layout. If you are using a Tee-Topology and the lengths of your byte lanes closely matches the length of you clock trace(s), then WALAT should = 0, and you can save that extra clock for write cases.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But that is low hanging fruit: Where else can you save extra clock cycles?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is where you are going to have to experiment with DDR timing settings, and you are probably going to want to use more reliable DDR devices like Micron, to see if you can push the limits of their Read and Write latencies.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For refreshes, make sure that you are using the minimum JEDEC required refresh rate of 7.9 us.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The other timing parameters that may potentially help you get a performance boost are:&lt;/P&gt;&lt;P&gt;tCL&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (CAS Read Latency)&lt;BR /&gt;tRFC&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Refresh Command to Active or Refresh command time)&lt;/P&gt;&lt;P&gt;tRCD&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Active command to internal read or write delay time)&lt;/P&gt;&lt;P&gt;tRP&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Precharge command period)&lt;/P&gt;&lt;P&gt;tRC&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Active to Active or Refresh Command period)&lt;/P&gt;&lt;P&gt;tRAS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Active to Prechare Command period)&lt;/P&gt;&lt;P&gt;tRPA&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Precharge-All command period)&lt;/P&gt;&lt;P&gt;tWL&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Write recovery time)&lt;/P&gt;&lt;P&gt;tCWL&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (CAS Write Latency)&lt;/P&gt;&lt;P&gt;tRTP&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Internal Read command to Precharge commnad delay)&lt;/P&gt;&lt;P&gt;tWTR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Internal WRITE to READ commnad delay)&lt;/P&gt;&lt;P&gt;tRRD&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Active to Active command period)&lt;/P&gt;&lt;P&gt;RTW_SAME&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Read to write delay for same chip select)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You are using only one chip select, correct? Two chip selects adds delays, and therefore, lowers performance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Mostly what you are looking to achieve is to minimize the time it takes to close one Active Bank/Row and open a different one. This is all overhead which takes away from performance.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;That is essentially all you are going to be able to do without modifying the test code to limit the number of Active Bank/Row changes required during testing.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I really don't think the AXI pipeline itself is holding you back any.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Mark&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 10 Nov 2014 21:20:40 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387217#M56172</guid>
      <dc:creator>TheAdmiral</dc:creator>
      <dc:date>2014-11-10T21:20:40Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387218#M56173</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I'm hitting similar situation but my workload is mostly GPU related.&lt;/P&gt;&lt;P&gt;Below is a dump of mmdc_prof. Total read/write is ~2.4GB/sec and the bus utilization is 30% (hence hinting theoretical max around 8GB/sec).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;On the other hand it shows Bus Load = 99% - what is this Bus Load? Is this the internal AXI bus? (in general where can we get source code of mmdc_prof)?&lt;/P&gt;&lt;P&gt;Does this mean that we are Bus Load limited? (hinting AXI)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Performance numbers per unit taken from mmdc_prof -&lt;/P&gt;&lt;P&gt;(UNIT) = (Read) + (Write) = (Total)&lt;/P&gt;&lt;P&gt;CPU = 61 + 19 = 80&lt;/P&gt;&lt;P&gt;IPU = 395 + 0 = 395 (refreshing 1080p50)&lt;/P&gt;&lt;P&gt;VPU = 290 + 146 = 436&lt;/P&gt;&lt;P&gt;GPU = 763 + 795 = 1558&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Total = 1509 + 960 = 2469 MB/sec&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;MMDC new Profiling results:&lt;/P&gt;&lt;P&gt;***********************&lt;/P&gt;&lt;P&gt;Total cycles count: 528073967&lt;/P&gt;&lt;P&gt;Busy cycles count: 523154240&lt;/P&gt;&lt;P&gt;Read accesses count: 34949275&lt;/P&gt;&lt;P&gt;Write accesses count: 18807831&lt;/P&gt;&lt;P&gt;Read bytes count: 1564537198&lt;/P&gt;&lt;P&gt;Write bytes count: 1004157153&lt;/P&gt;&lt;P&gt;Avg. Read burst size: 44&lt;/P&gt;&lt;P&gt;Avg. Write burst size: 53&lt;/P&gt;&lt;P&gt;Read: 1492.06 MB/s /&amp;nbsp; Write: 957.64 MB/s&amp;nbsp; Total: 2449.70 MB/s &lt;/P&gt;&lt;P&gt;Utilization: 30%&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Bus Load: 99%&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Bytes Access: 47&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 01 Dec 2014 12:21:59 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387218#M56173</guid>
      <dc:creator>rabeeh</dc:creator>
      <dc:date>2014-12-01T12:21:59Z</dc:date>
    </item>
    <item>
      <title>Re: i.MX6 DDR3 RAM-Performance 32 bit vs. 64 bit interface.</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387219#M56174</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Did you figure out what does mean " Bus load"? I am investigating very strange behavior on our custom board, the results for CPU are :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;./mmdc_prof -m 0x00060001&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;MMDC new Profiling results:&lt;/P&gt;&lt;P&gt;***********************&lt;/P&gt;&lt;P&gt;Total cycles count: 528037303&lt;/P&gt;&lt;P&gt;Busy cycles count: 39096421&lt;/P&gt;&lt;P&gt;Read accesses count: 2197&lt;/P&gt;&lt;P&gt;Write accesses count: 1517&lt;/P&gt;&lt;P&gt;Read bytes count: 63976&lt;/P&gt;&lt;P&gt;Write bytes count: 45432&lt;/P&gt;&lt;P&gt;Avg. Read burst size: 29&lt;/P&gt;&lt;P&gt;Avg. Write burst size: 29&lt;/P&gt;&lt;P&gt;Read: 0.06 MB/s /&amp;nbsp; Write: 0.04 MB/s&amp;nbsp; Total: 0.10 MB/s &lt;/P&gt;&lt;P&gt;Utilization: 0%&lt;/P&gt;&lt;P&gt;Bus Load: 7%&lt;/P&gt;&lt;P&gt;Bytes Access: 29&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;MMDC new Profiling results:&lt;/P&gt;&lt;P&gt;***********************&lt;/P&gt;&lt;P&gt;Total cycles count: 528039111&lt;/P&gt;&lt;P&gt;Busy cycles count: 39322154&lt;/P&gt;&lt;P&gt;Read accesses count: 7221&lt;/P&gt;&lt;P&gt;Write accesses count: 2494&lt;/P&gt;&lt;P&gt;Read bytes count: 202392&lt;/P&gt;&lt;P&gt;Write bytes count: 77600&lt;/P&gt;&lt;P&gt;Avg. Read burst size: 28&lt;/P&gt;&lt;P&gt;Avg. Write burst size: 31&lt;/P&gt;&lt;P&gt;Read: 0.19 MB/s /&amp;nbsp; Write: 0.07 MB/s&amp;nbsp; Total: 0.27 MB/s &lt;/P&gt;&lt;P&gt;Utilization: 0%&lt;/P&gt;&lt;P&gt;Bus Load: 7%&lt;/P&gt;&lt;P&gt;Bytes Access: 28&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Dou you have any guesses why Bus load is 7% with only 0.27MB/s ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;for IPU:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;./mmdc_prof -m 0x00060004&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;MMDC new Profiling results:&lt;/P&gt;&lt;P&gt;***********************&lt;/P&gt;&lt;P&gt;Total cycles count: 528047999&lt;/P&gt;&lt;P&gt;Busy cycles count: 40319160&lt;/P&gt;&lt;P&gt;Read accesses count: 489640&lt;/P&gt;&lt;P&gt;Write accesses count: 0&lt;/P&gt;&lt;P&gt;Read bytes count: 31336960&lt;/P&gt;&lt;P&gt;Write bytes count: 0&lt;/P&gt;&lt;P&gt;Avg. Read burst size: 0&lt;/P&gt;&lt;P&gt;Avg. Write burst size: 0&lt;/P&gt;&lt;P&gt;Read: 29.89 MB/s /&amp;nbsp; Write: 0.00 MB/s&amp;nbsp; Total: 29.89 MB/s &lt;/P&gt;&lt;P&gt;Utilization: 0%&lt;/P&gt;&lt;P&gt;Bus Load: 0%&lt;/P&gt;&lt;P&gt;Bytes Access: 0&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 11 Jun 2015 02:22:17 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-MX6-DDR3-RAM-Performance-32-bit-vs-64-bit-interface/m-p/387219#M56174</guid>
      <dc:creator>vladislavkaluts</dc:creator>
      <dc:date>2015-06-11T02:22:17Z</dc:date>
    </item>
  </channel>
</rss>

