<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx? in ColdFire/68K Microcontrollers and Processors</title>
    <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164172#M5501</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This is not an answer to your question, but have you tried to modify the XBS module to play with the master priorities? The bus arbiter can make a difference to the eDMA. Also assembler and burst size can make a difference&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this post creates a new thread of discussion for your issue&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anyone else?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 29 Dec 2009 03:06:40 GMT</pubDate>
    <dc:creator>PaoloRenzo</dc:creator>
    <dc:date>2009-12-29T03:06:40Z</dc:date>
    <item>
      <title>Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx?</title>
      <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164171#M5500</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I'm using the MCF5329.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have posted previously about the limited speed of the supplied (with the gcc compiler) library memcpy() function on this hardware. The SDRAM bus at 240MHz has a bandwidth of 128MB/s, but with the supplied copy function I'm getting a maximum of 80MB/s, and usually less.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The Coldfire 3 User Manual (from Freescale's site) says in part:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 5.4.3 RAM Initialization&lt;BR /&gt;...&lt;/P&gt;&lt;P&gt;... There are various instructions to support&lt;BR /&gt;this function, including memory-to-memory&lt;/P&gt;&lt;P&gt;move instructions, or the MOVEM opcode.&lt;BR /&gt;The MOVEM instruction is optimized to&lt;/P&gt;&lt;P&gt;generate line-sized burst fetches on 0-modulo-&lt;BR /&gt;16 addresses, so this opcode generally&lt;/P&gt;&lt;P&gt;provides maximum performance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I should be using MOVEM.L-based library copies.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I could write my own, but it'd be better to get some debugged and optimised ones of these.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does anyone have any good library copy routines for the Coldfire chips that use MOVEM.L instructions in the inner loops? I can't find any examples on Freescale's site.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Even better would be some that are set up to use the EDMA channels. It'd be good to start big copies running on the DMA and then get some other work done with the CPU.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for any pointers, URLs, code.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Tom&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 28 Dec 2009 20:05:39 GMT</pubDate>
      <guid>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164171#M5500</guid>
      <dc:creator>TomE</dc:creator>
      <dc:date>2009-12-28T20:05:39Z</dc:date>
    </item>
    <item>
      <title>Re: Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx?</title>
      <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164172#M5501</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This is not an answer to your question, but have you tried to modify the XBS module to play with the master priorities? The bus arbiter can make a difference to the eDMA. Also assembler and burst size can make a difference&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this post creates a new thread of discussion for your issue&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Anyone else?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 29 Dec 2009 03:06:40 GMT</pubDate>
      <guid>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164172#M5501</guid>
      <dc:creator>PaoloRenzo</dc:creator>
      <dc:date>2009-12-29T03:06:40Z</dc:date>
    </item>
    <item>
      <title>Re: Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx?</title>
      <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164173#M5502</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;/* quickly copy multiples of 16 bytes.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;*/&lt;/P&gt;&lt;P&gt;_memcpy16:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; link&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a6,#-16&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* save a6 and room for 4 longs */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; movem.l&amp;nbsp; d4-d7,(sp)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* save registers 4x4 */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; move.l&amp;nbsp;&amp;nbsp; 8(a6),a0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* destination */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; move.l&amp;nbsp;&amp;nbsp; 12(a6),a1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* source */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; move.l&amp;nbsp;&amp;nbsp; 16(a6),d0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* length */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; moveq.l&amp;nbsp; #16,d1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* d1 is constant 16 */&lt;/P&gt;&lt;P&gt;.loopm:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; movem.l&amp;nbsp; (a1),d4-d7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* read a line */&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; adda.l&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;d1,a1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* src += 16 */&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; movem.l&amp;nbsp; d4-d7,(a0)&amp;nbsp;&amp;nbsp;&amp;nbsp; /* write the line */&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; adda.l&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;d1,a0&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* dest += 16 */&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; sub.l&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;d1,d0&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; /* length -= 16 */&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; bgt.b&amp;nbsp;&amp;nbsp;&amp;nbsp; .loopm&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; /* loop while positive */&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; movem.l&amp;nbsp; (sp),d4-d7&amp;nbsp;&amp;nbsp;&amp;nbsp; /* restore registers */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; unlk&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a6&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; rts&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 30 Dec 2009 13:39:07 GMT</pubDate>
      <guid>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164173#M5502</guid>
      <dc:creator>bkatt</dc:creator>
      <dc:date>2009-12-30T13:39:07Z</dc:date>
    </item>
    <item>
      <title>Re: Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx?</title>
      <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164174#M5503</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;bkatt wrote:&lt;BR /&gt;&lt;P&gt;/* quickly copy multiples of 16 bytes.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;*/&lt;/P&gt;&lt;P&gt;_memcpy16:&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&amp;nbsp;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Thanks. Neat and efficient. I'm looking at using the above, or modifying it into a more general memcpy().&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But for that I'd like to look at the ABI to know what registers are saved and so on. I note from a post from you in this forum dated October 12 as "Re: Coldfire register ABI documentation" that you said:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt; Note that GCC uses something like the standard&lt;/P&gt;&lt;P&gt;&amp;gt; ABI, but with register D2 preserved by functions&lt;/P&gt;&lt;P&gt;&amp;gt; and pointers returned in D0.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm using gcc. Any idea where its version of the ABI is documented?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using m68k-elf-objdump on the libc code supports the "observation" that D0, D1, A0 and A1 seem to be the temporaries, but I'd rather have it written down somewhere than be "coding by reverse engineering" on a current supported platform.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 02 Jan 2010 14:45:38 GMT</pubDate>
      <guid>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164174#M5503</guid>
      <dc:creator>TomE</dc:creator>
      <dc:date>2010-01-02T14:45:38Z</dc:date>
    </item>
    <item>
      <title>Re: Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx?</title>
      <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164175#M5504</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;bkatt wrote:&lt;BR /&gt;&lt;P&gt;/* quickly copy multiples of 16 bytes.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;*/&lt;/P&gt;&lt;P&gt;_memcpy16:&lt;/P&gt;&amp;nbsp;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;I've just tested this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My MCF5329 is supposed to have a raw memory bandwidth of 128 MB/s.&lt;/P&gt;&lt;P&gt;That corresponds to the 80MHz SDRAM clock with 10 clocks per 16-byte&lt;/P&gt;&lt;P&gt;read or write (80MHz / 10 * 16). For a memory copy that should correspond&lt;/P&gt;&lt;P&gt;to 64MB/s copying speed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is what I'm measuring:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;Function&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MB/s&amp;nbsp;&amp;nbsp; % of 128MB/x&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;memcpy&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 38.85&amp;nbsp; 60.70%&lt;BR /&gt;memcpy_gcc_2_9&amp;nbsp;&amp;nbsp;&amp;nbsp; 38.19&amp;nbsp; 59.67%&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;memcpy_gcc_4_3&amp;nbsp;&amp;nbsp;&amp;nbsp; 37.85&amp;nbsp; 59.14%&lt;BR /&gt;memcpy_gcc_4_4&amp;nbsp;&amp;nbsp;&amp;nbsp; 34.77&amp;nbsp; 54.33%&lt;BR /&gt;memcpy_moveml&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 46.34&amp;nbsp; 72.41%&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The well written library memcpy() is getting about 39MB/s ,within the&lt;/P&gt;&lt;P&gt;measurement error/variability of the gcc V2.9 one I have.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;BKatt's one is getting a bit over 46MB/s. That's quite an improvement.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The other ones are for code reportedly generated by GCC 2.9, 4.3 and 4.4. The code is&lt;/P&gt;&lt;P&gt;getting a lot worse (slower, bigger, less efficient) with time, but this only starts&lt;/P&gt;&lt;P&gt;affecting the benchmark results on this chip with the poor 4.4 code.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Jan 2010 07:51:20 GMT</pubDate>
      <guid>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164175#M5504</guid>
      <dc:creator>TomE</dc:creator>
      <dc:date>2010-01-05T07:51:20Z</dc:date>
    </item>
    <item>
      <title>Re: Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx?</title>
      <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164176#M5505</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I've been running more tests to try and find the most efficient memory copy functions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is on a 240MHz MCF5329.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The SDRAM is clocked at 80MHz and can read 4 bytes per clock, so that's an "ultimate bandwidth" of 320MB/s. The CPU is theoretically 960MB/s. But the normal memcpy() can only manage about 7% of that!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;One of the App Notes claims the LCDC can read the RAM at 128MB/s, which equates to 10 80MHz clocks to read 4 32-bit words, so 6 clocks of overhead for 4 working clocks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's a table of memory copy functions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;Function&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Min&amp;nbsp;&amp;nbsp; Max&amp;nbsp;&amp;nbsp; Aver&amp;nbsp; StDev Max&amp;nbsp;&amp;nbsp;&amp;nbsp; Avg Speed&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; us&amp;nbsp;&amp;nbsp;&amp;nbsp; us&amp;nbsp;&amp;nbsp;&amp;nbsp; us&amp;nbsp;&amp;nbsp;&amp;nbsp; us&amp;nbsp;&amp;nbsp;&amp;nbsp; kb/s&amp;nbsp;&amp;nbsp; kb/s&lt;BR /&gt;===========================================================&lt;BR /&gt;memcpy_gcc_4_4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4073&amp;nbsp; 4246&amp;nbsp; 4202&amp;nbsp; 78.1&amp;nbsp; 32180&amp;nbsp; 31202&lt;BR /&gt;memcpy_gcc_4_3_O1&amp;nbsp; 3788&amp;nbsp; 3939&amp;nbsp; 3919&amp;nbsp; 53.0&amp;nbsp; 34601&amp;nbsp; 33448 +17%&lt;BR /&gt;memcpy_gcc_4_3_O2&amp;nbsp; 3717&amp;nbsp; 3937&amp;nbsp; 3909&amp;nbsp; 77.5&amp;nbsp; 35262&amp;nbsp; 33543&lt;BR /&gt;memcpy_gcc_2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3717&amp;nbsp; 3935&amp;nbsp; 3816&amp;nbsp; 102.0 35262&amp;nbsp; 34367&lt;BR /&gt;&lt;BR /&gt;memcpy(131072)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3734&amp;nbsp; 3915&amp;nbsp; 3829&amp;nbsp; 56.4&amp;nbsp; 35102&amp;nbsp; 34241 Reference&lt;BR /&gt;memcpy_moveml&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3132&amp;nbsp; 3305&amp;nbsp; 3283&amp;nbsp; 61.2&amp;nbsp; 41849&amp;nbsp; 39932&lt;/FONT&gt; &lt;FONT face="courier new,courier" size="1"&gt;+17%&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="1"&gt;memcpy_dma&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2993&amp;nbsp; 2994&amp;nbsp; 2994&amp;nbsp; 0.5&amp;nbsp;&amp;nbsp; 43792&amp;nbsp; 43783 +28%&lt;BR /&gt;memcpy_moveml_32&amp;nbsp;&amp;nbsp; 2500&amp;nbsp; 2612&amp;nbsp; 2543&amp;nbsp; 42.1&amp;nbsp; 52428&amp;nbsp; 51564 +51%&lt;BR /&gt;memcpy_stack&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2390&amp;nbsp; 2475&amp;nbsp; 2438&amp;nbsp; 26.6&amp;nbsp; 54841&amp;nbsp; 53762 +57%&lt;BR /&gt;memcpy_stack_32&amp;nbsp;&amp;nbsp;&amp;nbsp; 2265&amp;nbsp; 2344&amp;nbsp; 2317&amp;nbsp; 25.2&amp;nbsp; 57868&amp;nbsp; 56572 +65%&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The above table gives the minimum, maximum and average time to copy 128 kbytes from SDRAM to SDRAM. These measurements were conducted with interrupts and all DMA disabled. The Cache is set to write-through. All copies are multiples of 16 bytes, all aligned on 16-byte boundaries to match the cache line length. This is an artificial situation for general memory copies, but I'm copying bitmaps around, and they're all 16-byte aligned in memory.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The variation (the Standard Deviation of 8 separate measurements for each test) is due to the cache being rather indeterminate in which "way" it is going to invalidate on successive copies of the same data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The different "gcc" tests are what different versions of gcc do to a simple C-based memcpy() function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;memcpy_dma() uses the DMA controller and waits for it to finish.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;memcpy() is the library one. The inner loop is the old favourite from the 68000 (and PDP-11 :smileyhappy: days:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;40161034:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20d9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; movel %a1@+,%a0@+&lt;BR /&gt;40161036:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20d9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; movel %a1@+,%a0@+&lt;BR /&gt;40161038:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20d9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; movel %a1@+,%a0@+&lt;BR /&gt;4016103a:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20d9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; movel %a1@+,%a0@+&lt;BR /&gt;4016103c:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5380&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; subql #1,%d0&lt;BR /&gt;4016103e:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6a00 fff4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; bplw 40161034 &amp;lt;memcpy+0x50&amp;gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;memcpy_moveml has the following inner loop:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/FONT&gt; &lt;FONT face="courier new,courier" size="1"&gt;moveq.l&amp;nbsp;&amp;nbsp; &amp;nbsp;#16,%d1&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* d1 is constant 16 */&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="courier new,courier" size="1"&gt;.L10:&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; movem.l&amp;nbsp;&amp;nbsp; (%a1),%d4-%d7&amp;nbsp;&amp;nbsp; &amp;nbsp;/* read a line */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;adda.l&amp;nbsp;&amp;nbsp;&amp;nbsp; %d1,%a1&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; /* src += 16 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;movem.l&amp;nbsp;&amp;nbsp; %d4-%d7,(%a0)&amp;nbsp;&amp;nbsp; &amp;nbsp;/* write the line */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;adda.l&amp;nbsp;&amp;nbsp;&amp;nbsp; %d1,%a0&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; /* dest += 16 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;sub.l&amp;nbsp;&amp;nbsp; &amp;nbsp; %d1,%d0&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; /* length -= 16 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;bgt.b&amp;nbsp;&amp;nbsp; &amp;nbsp; .L10&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; /* loop while positive */&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;memcpy_moveml_32() copies 32 bytes at a time and has the following inner loop:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/FONT&gt; &lt;FONT face="courier new,courier" size="1"&gt;moveq.l&amp;nbsp;&amp;nbsp; &amp;nbsp;#32,%d1&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* d1 is constant 32 */&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="courier new,courier" size="1"&gt;.L13:&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="courier new,courier" size="1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; movem.l&amp;nbsp;&amp;nbsp; (%a1),%d4-%d7/%a2-%a5&amp;nbsp;&amp;nbsp; &amp;nbsp;/* read a line */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;movem.l&amp;nbsp;&amp;nbsp; %d4-%d7/%a2-%a5,(%a0)&amp;nbsp;&amp;nbsp; &amp;nbsp;/* write the line */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;adda.l&amp;nbsp;&amp;nbsp; &amp;nbsp;%d1,%a1&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* src += 32 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;adda.l&amp;nbsp;&amp;nbsp; &amp;nbsp;%d1,%a0&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* dest += 32 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;sub.l&amp;nbsp;&amp;nbsp; &amp;nbsp; %d1,%d0&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* length -= 32 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;bgt.b&amp;nbsp;&amp;nbsp; &amp;nbsp; .L13&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; /* loop while positive */&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;memcpy_stack() is surprisingly:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; uint32_t&amp;nbsp;&amp;nbsp; &amp;nbsp;vnStackBuf[MEMCPY_STACK_SIZE + 4];&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;&amp;nbsp; &amp;nbsp; ...&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (size &amp;gt;= 16)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;nBurst = MIN(size, MEMCPY_STACK_SIZE);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;memcpy_moveml(pStackBuf, src, nBurst);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;memcpy_moveml(dst, pStackBuf, nBurst);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;size -= nBurst;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;src=(void *)(((char *)src) + nBurst);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;dst = (void *)(((char *)dst) + nBurst);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;memcpy_stack() is the same but calls memcpy_moveml_32.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The fastest copy functions copy from SDRAM to SRAM (the stack is in SRAM) and then repeats the copy from SRAM back to SDRAM. This has the CPU doing double the number of operations, but ends up faster as it seems to keep the SDRAM controller on the same "open page" so it isn't wasting clocks switching pages and banks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The fastest ones also use MOVEM.L functions as they convert into direct burst memory cycles, and 32 bytes at a time are faster than 16.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Tom&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 21 Apr 2011 21:10:06 GMT</pubDate>
      <guid>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164176#M5505</guid>
      <dc:creator>TomE</dc:creator>
      <dc:date>2011-04-21T21:10:06Z</dc:date>
    </item>
    <item>
      <title>Re: Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx?</title>
      <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164177#M5506</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;More testing:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;Function&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Min&amp;nbsp;&amp;nbsp; Max&amp;nbsp;&amp;nbsp; Aver&amp;nbsp; StD&amp;nbsp; Max Spd Avg&amp;nbsp;&amp;nbsp; Memclk&lt;BR /&gt;===============================================================&lt;BR /&gt;memcpy_gcc_4_4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4277&amp;nbsp; 4278&amp;nbsp; 4277&amp;nbsp; 0.5&amp;nbsp; 30885&amp;nbsp; 30883&amp;nbsp; 41.77&lt;BR /&gt;memcpy_gcc_4_3_O1&amp;nbsp;&amp;nbsp; 3956&amp;nbsp; 3958&amp;nbsp; 3957&amp;nbsp; 0.5&amp;nbsp; 33391&amp;nbsp; 33382&amp;nbsp; 38.64&lt;BR /&gt;memcpy_gcc_4_3_O2&amp;nbsp;&amp;nbsp; 3956&amp;nbsp; 3957&amp;nbsp; 3957&amp;nbsp; 0.5&amp;nbsp; 33391&amp;nbsp; 33385&amp;nbsp; 38.64&lt;BR /&gt;memcpy_gcc_2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3956&amp;nbsp; 3957&amp;nbsp; 3956&amp;nbsp; 0.4&amp;nbsp; 33391&amp;nbsp; 33390&amp;nbsp; 38.63&lt;BR /&gt;&lt;BR /&gt;memcpy(132096)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3957&amp;nbsp; 3958&amp;nbsp; 3957&amp;nbsp; 0.5&amp;nbsp; 33382&amp;nbsp; 33379&amp;nbsp; 38.65&lt;BR /&gt;memcpy_moveml&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3323&amp;nbsp; 3323&amp;nbsp; 3323&amp;nbsp; 0.0&amp;nbsp; 39752&amp;nbsp; 39752&amp;nbsp; 32.45&lt;BR /&gt;memcpy_dma&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3022&amp;nbsp; 3023&amp;nbsp; 3022&amp;nbsp; 0.4&amp;nbsp; 43711&amp;nbsp; 43709&amp;nbsp; 29.51&lt;BR /&gt;memcpy_moveml_32&amp;nbsp;&amp;nbsp;&amp;nbsp; 2661&amp;nbsp; 2663&amp;nbsp; 2662&amp;nbsp; 0.7&amp;nbsp; 49641&amp;nbsp; 49618&amp;nbsp; 26.00&lt;BR /&gt;memcpy_stack&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2495&amp;nbsp; 2497&amp;nbsp; 2497&amp;nbsp; 0.8&amp;nbsp; 52944&amp;nbsp; 52912&amp;nbsp; 24.38&lt;BR /&gt;memcpy_moveml_192&amp;nbsp;&amp;nbsp; 2443&amp;nbsp; 2445&amp;nbsp; 2444&amp;nbsp; 0.6&amp;nbsp; 54071&amp;nbsp; 54052&amp;nbsp; 23.87&lt;BR /&gt;memcpy_moveml_48&amp;nbsp;&amp;nbsp;&amp;nbsp; 2442&amp;nbsp; 2442&amp;nbsp; 2442&amp;nbsp; 0.0&amp;nbsp; 54093&amp;nbsp; 54093&amp;nbsp; 23.85&lt;BR /&gt;memcpy_stack_48&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2401&amp;nbsp; 2402&amp;nbsp; 2402&amp;nbsp; 0.4&amp;nbsp; 55017&amp;nbsp; 54997&amp;nbsp; 23.46&lt;BR /&gt;memcpy_stack_32_mis 2398&amp;nbsp; 2399&amp;nbsp; 2398&amp;nbsp; 0.5&amp;nbsp; 55085&amp;nbsp; 55079&amp;nbsp; 23.42&lt;BR /&gt;memcpy_stack_32&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2396&amp;nbsp; 2397&amp;nbsp; 2396&amp;nbsp; 0.5&amp;nbsp; 55131&amp;nbsp; 55125&amp;nbsp; 23.40&lt;BR /&gt;memcpy_stack_192&amp;nbsp;&amp;nbsp;&amp;nbsp; 2369&amp;nbsp; 2371&amp;nbsp; 2370&amp;nbsp; 0.5&amp;nbsp; 55760&amp;nbsp; 55736&amp;nbsp; 23.14&lt;BR /&gt;memcpy_moveml_96_ps 2328&amp;nbsp; 2329&amp;nbsp; 2328&amp;nbsp; 0.4&amp;nbsp; 56742&amp;nbsp; 56739&amp;nbsp; 22.74&lt;BR /&gt;memRead_stack_32&amp;nbsp;&amp;nbsp;&amp;nbsp; 1553&amp;nbsp; 1554&amp;nbsp; 1554&amp;nbsp; 0.5&amp;nbsp; 85058&amp;nbsp; 85017&amp;nbsp; 15.17&lt;BR /&gt;memRead_moveml_32&amp;nbsp;&amp;nbsp; 1515&amp;nbsp; 1516&amp;nbsp; 1516&amp;nbsp; 0.4&amp;nbsp; 87192&amp;nbsp; 87141&amp;nbsp; 14.80&lt;BR /&gt;memWrite_stack_32&amp;nbsp;&amp;nbsp;&amp;nbsp; 671&amp;nbsp;&amp;nbsp; 671&amp;nbsp;&amp;nbsp; 671&amp;nbsp; 0.0 196864 196864&amp;nbsp;&amp;nbsp; 6.55&lt;BR /&gt;memWrite_moveml_32&amp;nbsp;&amp;nbsp; 636&amp;nbsp;&amp;nbsp; 637&amp;nbsp;&amp;nbsp; 637&amp;nbsp; 0.5 207698 207535&amp;nbsp;&amp;nbsp; 6.22&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;&lt;BR /&gt;memcpy:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Library memcpy() function&lt;BR /&gt;moveml:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4 register movem.l&lt;BR /&gt;moveml_32:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8 register movem.l&lt;BR /&gt;moveml_48:&amp;nbsp;&amp;nbsp;&amp;nbsp; 12 register movem.l&lt;BR /&gt;moveml_96_ps: 12 register movem.l doubled-up (see code below)&lt;BR /&gt;moveml_192:&amp;nbsp;&amp;nbsp; moveml_48 unrolled 4 times&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier" size="1"&gt;stack:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SDRAM -&amp;gt; Stack (in SRAM), then Stack -&amp;gt; SDRAM.&lt;BR /&gt;Read:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Read-only test&lt;BR /&gt;Write:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Write-only test&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;Memclk:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; The number of memory clocks per cache line.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;The last column shows how many memory clocks each copy took per cache line (16 bytes). The "Read" and "Write" tests are the most interesting. They are reading the SDRAM to registers (and throwing the result away) and likewise writing from registers to SDRAM. The SDRAM is capable of 32 bits per clock, or four clocks per cache line. It can be burst-written at about 6 clocks per cache line, quite close to theory. It can only be read at FOURTEEN clocks per cache line. Even when the CPU is trying as hard as it can, it looks like the SDRAM controller is closing the bank and precharging on every read, as this mode of operation is known to take 10 or 11 clocks per cache line.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The fastest memory copy function has this as the inner loop:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier" size="1"&gt;.L16:&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;movem.l&amp;nbsp;&amp;nbsp; &amp;nbsp;(%a1),%d1-%d7/%a2-%a6&amp;nbsp;&amp;nbsp; &amp;nbsp;/* read first chunk */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;movem.l&amp;nbsp;&amp;nbsp; &amp;nbsp;%d1-%d7/%a2-%a6,(%sp)&amp;nbsp;&amp;nbsp; &amp;nbsp;/* write to SRAM */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;movem.l&amp;nbsp;&amp;nbsp; &amp;nbsp;48(%a1),%d1-%d7/%a2-%a6&amp;nbsp; /* read second chunk */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;movem.l&amp;nbsp;&amp;nbsp; &amp;nbsp;%d1-%d7/%a2-%a6,48(%a0)&amp;nbsp; /* write second chunk */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;movem.l&amp;nbsp;&amp;nbsp; &amp;nbsp;(%sp),%d1-%d7/%a2-%a6&amp;nbsp;&amp;nbsp; &amp;nbsp;/* get first line back */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;movem.l&amp;nbsp;&amp;nbsp; &amp;nbsp;%d1-%d7/%a2-%a6,(%a0)&amp;nbsp;&amp;nbsp; &amp;nbsp;/* write FIRST chunk */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;moveq.l&amp;nbsp;&amp;nbsp; &amp;nbsp;#96,%d1&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; /* d1 is constant 96 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;adda.l&amp;nbsp;&amp;nbsp; &amp;nbsp; %d1,%a1&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; /* src += 96 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;adda.l&amp;nbsp;&amp;nbsp; &amp;nbsp; %d1,%a0&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; /* dest += 96 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;sub.l&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; %d1,%d0&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; /* length -= 96 */&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;bgt.b&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; .L16&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; /* loop while positive */&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It has the disadvantage that it only moves multipes of 96 bytes, and is only slightly faster (3%) than the ones that copies multples of 32 bytes via the stack in SRAM.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 27 Apr 2011 08:53:49 GMT</pubDate>
      <guid>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164177#M5506</guid>
      <dc:creator>TomE</dc:creator>
      <dc:date>2011-04-27T08:53:49Z</dc:date>
    </item>
    <item>
      <title>Re: Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx?</title>
      <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164178#M5507</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;From my previous testing, theSDRAM controller seems to be able to keep the SDRAM page open for WRITE accesses, but not for READ accesses.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've got the Crossbar parking on the CPU and have set it as the highest priority, so it shouldn't be causing a problem.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The Reference Manual states:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="georgia,palatino"&gt;18.5.1.2 Read Command (READ)&lt;BR /&gt;When the SDRAMC receives a read request via the internal bus, it first checks the row and bank of the new access. If the address falls within the active row of an active bank, it is a page hit, and the read is issued as soon as possible (pending any delays required by previous commands). If the address is within an inactive bank, the memory controller issues an ACTV followed by the read command. If the address is not within the active row of an active bank, the memory controller issues a pre command to close the active row. Then, the SDRAMC issues ACTV to activate the necessary row and bank for the new access, followed by the read to the SDRAM.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So from the above the SDRAM controller should be able to keep the bank open, but it doesn't seem to be doing that in my tests.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can anyone suggest what might be preventing the SDRAM controller from running at what should be "full speed"?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 27 Apr 2011 08:55:21 GMT</pubDate>
      <guid>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164178#M5507</guid>
      <dc:creator>TomE</dc:creator>
      <dc:date>2011-04-27T08:55:21Z</dc:date>
    </item>
    <item>
      <title>Re: Does anyone have MOVEM.L-based memcpy() libraries for MCF53xx?</title>
      <link>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164179#M5508</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I wrote:&lt;/P&gt;&lt;P&gt;&amp;gt; From my previous testing, theSDRAM controller seems to be able to keep the&lt;/P&gt;&lt;P&gt;&amp;gt; SDRAM page open for WRITE accesses, but not for READ accesses.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Not so.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I reported my tests back to Freescale via our local rep and got a great response by the next day.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It included traces showing that the SDRAM controller does keep pages open between reads, but that &lt;STRONG&gt;SOMETHING&lt;/STRONG&gt; (unknown) between the CPU and the SDRAM Controller is adding longish delays between the end of one SDRAM burst read and the start of the next one. Some 18-20 or more CPU clocks' worth of delay.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;At least the tests at Freescale show that I haven't made some stupid configuraiton error somewhere that was making it run slow.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"Simple and wrong theory" suggests a maximum write speed of 320MB/s (320,000,000 and not 320 * 1024 * 1024).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Actual tests achieve about 208MB/s. Not bad.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"Simple and wrong theory" suggests a maximum read speed of less than 320MB/s with this CPU, as the core blocks waiting for the data from the previous read before starting the next one. It should be able to manage a four-burst read in 8 clocks, giving 160MB/s.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Actual tests take 13 or more clocks per burst, which means less than 98MB/s. In my tests reported previously in this thread I'm measuring less than 90MB/s.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The DMA controller isn't faster than the CPU, so that isn't an option for this, unless copies can be overlapped with other CPU activities.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Tom&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 06 May 2011 12:38:27 GMT</pubDate>
      <guid>https://community.nxp.com/t5/ColdFire-68K-Microcontrollers/Does-anyone-have-MOVEM-L-based-memcpy-libraries-for-MCF53xx/m-p/164179#M5508</guid>
      <dc:creator>TomE</dc:creator>
      <dc:date>2011-05-06T12:38:27Z</dc:date>
    </item>
  </channel>
</rss>

