<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MSC8156 memory bandwidth in Other NXP Products</title>
    <link>https://community.nxp.com/t5/Other-NXP-Products/MSC8156-memory-bandwidth/m-p/181934#M1350</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been experimenting with the MSC8156EVM board over the last few days using codewarrior 10.2.2 and i am struggling to get the expected memory bandwidth from the device and i was wondering if you had any tips.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My naive application is below. Note that all levels of cache are enabled and a timing overhead has been pre-calculated. Many iterations (10000+) are performed and an average taken. all measurements were performed with a release build:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The results for 16KBytes of data copied from M2 to M3 is 68us which i believe is 240MBytes/s. i would be expecting something closer to 4+GBytes/s given that M3 memory is 128bits wide and clocked at 500MHz (theoretical max of 8GBytes/s)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;16KBytes of data copied from M3 to M2 is 100us (163MBytes/s)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;16KBytes of data copied from M2 to M2 is 5us (3276MBytes/s) &amp;lt;- expected 8000MB/s???&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;  M2_Array = (uint32_t *) osAlignedMalloc((testLength * sizeof(uint32_t)), OS_MEM_LOCAL, ALIGNED_16_BYTES);    OS_ASSERT_COND(M2_Array != NULL);    M3_Array = (uint32_t *) osAlignedMalloc((testLength * sizeof(uint32_t)), OS_MEM_SHARED, ALIGNED_16_BYTES);    OS_ASSERT_COND(M3_Array != NULL);        for(i = 0; i &amp;lt; testLength; i++)    {        srand(i*2);        from[i] = rand();    }    #if (DCACHE_ENABLE == ON)    status = osCacheDataSweepGlobal(CACHE_FLUSH);    if (status != OS_SUCCESS) OS_ASSERT;#endif#if (L2CACHE_ENABLE == ON)       status = osCacheL2UnifiedSweepGlobal(CACHE_FLUSH);    if (status != OS_SUCCESS) OS_ASSERT;#endif            timeStart = ReadFullPerfMonCount();       for (i = 0; i &amp;lt; NUM_TEST_ITERATIONS; i++)    {        memcpy(&amp;amp;M3_Array[0], &amp;amp;M2_Array[0], (testLength * sizeof(uint32_t)));             // M2 to M3    }    #if (DCACHE_ENABLE == ON)    status = osCacheDataSweepGlobal(CACHE_FLUSH);    if (status != OS_SUCCESS) OS_ASSERT;#endif#if (L2CACHE_ENABLE == ON)       status = osCacheL2UnifiedSweepGlobal(CACHE_FLUSH);    if (status != OS_SUCCESS) OS_ASSERT;#endif                timeEnd = ReadFullPerfMonCount();        duration = timeEnd - timeStart - overhead;    duration /= NUM_TEST_ITERATIONS;        printf("Memory Copy %d byte M2 -&amp;gt; M3 duration %llu HSSI clock cycles, %f microseconds\n", (testLength * sizeof(uint32_t)), duration, ((double)duration / osHssiClockGet()));  &lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 29 Oct 2020 09:32:24 GMT</pubDate>
    <dc:creator>matt8156</dc:creator>
    <dc:date>2020-10-29T09:32:24Z</dc:date>
    <item>
      <title>MSC8156 memory bandwidth</title>
      <link>https://community.nxp.com/t5/Other-NXP-Products/MSC8156-memory-bandwidth/m-p/181934#M1350</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been experimenting with the MSC8156EVM board over the last few days using codewarrior 10.2.2 and i am struggling to get the expected memory bandwidth from the device and i was wondering if you had any tips.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My naive application is below. Note that all levels of cache are enabled and a timing overhead has been pre-calculated. Many iterations (10000+) are performed and an average taken. all measurements were performed with a release build:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The results for 16KBytes of data copied from M2 to M3 is 68us which i believe is 240MBytes/s. i would be expecting something closer to 4+GBytes/s given that M3 memory is 128bits wide and clocked at 500MHz (theoretical max of 8GBytes/s)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;16KBytes of data copied from M3 to M2 is 100us (163MBytes/s)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;16KBytes of data copied from M2 to M2 is 5us (3276MBytes/s) &amp;lt;- expected 8000MB/s???&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;  M2_Array = (uint32_t *) osAlignedMalloc((testLength * sizeof(uint32_t)), OS_MEM_LOCAL, ALIGNED_16_BYTES);    OS_ASSERT_COND(M2_Array != NULL);    M3_Array = (uint32_t *) osAlignedMalloc((testLength * sizeof(uint32_t)), OS_MEM_SHARED, ALIGNED_16_BYTES);    OS_ASSERT_COND(M3_Array != NULL);        for(i = 0; i &amp;lt; testLength; i++)    {        srand(i*2);        from[i] = rand();    }    #if (DCACHE_ENABLE == ON)    status = osCacheDataSweepGlobal(CACHE_FLUSH);    if (status != OS_SUCCESS) OS_ASSERT;#endif#if (L2CACHE_ENABLE == ON)       status = osCacheL2UnifiedSweepGlobal(CACHE_FLUSH);    if (status != OS_SUCCESS) OS_ASSERT;#endif            timeStart = ReadFullPerfMonCount();       for (i = 0; i &amp;lt; NUM_TEST_ITERATIONS; i++)    {        memcpy(&amp;amp;M3_Array[0], &amp;amp;M2_Array[0], (testLength * sizeof(uint32_t)));             // M2 to M3    }    #if (DCACHE_ENABLE == ON)    status = osCacheDataSweepGlobal(CACHE_FLUSH);    if (status != OS_SUCCESS) OS_ASSERT;#endif#if (L2CACHE_ENABLE == ON)       status = osCacheL2UnifiedSweepGlobal(CACHE_FLUSH);    if (status != OS_SUCCESS) OS_ASSERT;#endif                timeEnd = ReadFullPerfMonCount();        duration = timeEnd - timeStart - overhead;    duration /= NUM_TEST_ITERATIONS;        printf("Memory Copy %d byte M2 -&amp;gt; M3 duration %llu HSSI clock cycles, %f microseconds\n", (testLength * sizeof(uint32_t)), duration, ((double)duration / osHssiClockGet()));  &lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 29 Oct 2020 09:32:24 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Other-NXP-Products/MSC8156-memory-bandwidth/m-p/181934#M1350</guid>
      <dc:creator>matt8156</dc:creator>
      <dc:date>2020-10-29T09:32:24Z</dc:date>
    </item>
    <item>
      <title>Re: MSC8156 memory bandwidth</title>
      <link>https://community.nxp.com/t5/Other-NXP-Products/MSC8156-memory-bandwidth/m-p/181935#M1351</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;A few things to think about here.&lt;/P&gt;&lt;P&gt;First of all, as the SC3850 can do 6 instructions in parallel, 2 x AGU operations for moves.You also want to pipeline these moves so you are moving every cycle.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There are many ways to improve the thorughput, but the first thing I would suggest is to turn on software optimization in the CodeWarrior compiler to level -o3.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also consider that multiple cores can access system level memory.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;-Andrew&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 17 Dec 2011 06:33:13 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Other-NXP-Products/MSC8156-memory-bandwidth/m-p/181935#M1351</guid>
      <dc:creator>AndrewinApps</dc:creator>
      <dc:date>2011-12-17T06:33:13Z</dc:date>
    </item>
  </channel>
</rss>

