<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>i.MX Processors中的主题 Re: RT1170: question about memcpy benchmark</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2004352#M231362</link>
    <description>&lt;P&gt;Here is the project.&lt;/P&gt;</description>
    <pubDate>Fri, 29 Nov 2024 12:59:44 GMT</pubDate>
    <dc:creator>zixunli</dc:creator>
    <dc:date>2024-11-29T12:59:44Z</dc:date>
    <item>
      <title>RT1170: question about memcpy benchmark</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2004350#M231361</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I did a memcpy benchmark on&amp;nbsp;MIMXRT1170-EVKB to compare speed between DTCM, cached OCRAM and non-cached OCRAM but the result is confusing.&lt;/P&gt;&lt;P&gt;I expect cached region will perform like DTCM but it looks like the cache doesn't provide benefit.&lt;/P&gt;&lt;P&gt;Each test is done multiple times to ensure cache filling.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="c"&gt;memcpy benchmark
DTCM - DTCM
Loop:0   Cycle:3258
Loop:1   Cycle:432
Loop:2   Cycle:431
Loop:3   Cycle:431
Loop:4   Cycle:431
Loop:5   Cycle:431
Loop:6   Cycle:431
Loop:7   Cycle:431
DTCM - NonCache
Loop:0   Cycle:534
Loop:1   Cycle:528
Loop:2   Cycle:528
Loop:3   Cycle:528
Loop:4   Cycle:528
Loop:5   Cycle:527
Loop:6   Cycle:528
Loop:7   Cycle:528
DTCM - Cache
Loop:0   Cycle:538
Loop:1   Cycle:532
Loop:2   Cycle:533
Loop:3   Cycle:532
Loop:4   Cycle:532
Loop:5   Cycle:532
Loop:6   Cycle:532
Loop:7   Cycle:532
DTCM - Cache+Flush
Loop:0   Cycle:863
Loop:1   Cycle:857
Loop:2   Cycle:865
Loop:3   Cycle:857
Loop:4   Cycle:865
Loop:5   Cycle:856
Loop:6   Cycle:865
Loop:7   Cycle:857&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The attached example can be placed into &lt;EM&gt;SDKROOT\boards\evkbmimxrt1170.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;I've modified linker script to ensure non-cached region is correctly set by BOARD_ConfigMPU since by default&amp;nbsp;__NCACHE_REGION_SIZE is 0.&lt;/P&gt;&lt;P&gt;Only&amp;nbsp;main.c,&amp;nbsp;MIMXRT1176xxxxx_cm7_flexspi_nor.icf,&amp;nbsp;MIMXRT1176xxxxx_cm7_ram.icf are modified, all other files are using kSDK default.&lt;/P&gt;&lt;P&gt;Linker output seems also correct:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;buffer1                 0x2000'0020  0x400  Data  Gb  main.o [5]
buffer2                 0x2000'0420  0x400  Data  Gb  main.o [5]
buffer_cached           0x202c'0000  0x400  Data  Gb  main.o [5]
buffer_ncache           0x2032'0000  0x400  Data  Gb  main.o [5]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Nov 2024 12:58:46 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2004350#M231361</guid>
      <dc:creator>zixunli</dc:creator>
      <dc:date>2024-11-29T12:58:46Z</dc:date>
    </item>
    <item>
      <title>Re: RT1170: question about memcpy benchmark</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2004352#M231362</link>
      <description>&lt;P&gt;Here is the project.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Nov 2024 12:59:44 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2004352#M231362</guid>
      <dc:creator>zixunli</dc:creator>
      <dc:date>2024-11-29T12:59:44Z</dc:date>
    </item>
    <item>
      <title>Re: RT1170: question about memcpy benchmark</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2005462#M231437</link>
      <description>&lt;P&gt;Cache doesn’t have effect on TCM fields. TCM interfaces are synchronous to the Cortex M7 and run at the same frequency. Hence it is expected that the access to the xTCM memories is single cycle.&lt;BR /&gt;OCRAM performance with cache performs closely to TCM as cache is single access, with cache disable the OCRAM performance is very low compared to TCM.&lt;/P&gt;
&lt;P&gt;Best regards,&lt;BR /&gt;Omar&lt;/P&gt;</description>
      <pubDate>Mon, 02 Dec 2024 22:17:15 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2005462#M231437</guid>
      <dc:creator>Omar_Anguiano</dc:creator>
      <dc:date>2024-12-02T22:17:15Z</dc:date>
    </item>
    <item>
      <title>Re: RT1170: question about memcpy benchmark</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2005469#M231438</link>
      <description>&lt;P&gt;Hi Omar,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your reply, however it doesn't answer at all the question.&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;&lt;SPAN&gt;Cache doesn’t have effect on TCM fields. TCM interfaces are synchronous to the Cortex M7 and run at the same frequency. Hence it is expected that the access to the xTCM memories is single cycle.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Yes it's true, in my benchmark I use DTCM-to-DTCM transfer as baseline to compare other memory types.&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;&lt;SPAN&gt;OCRAM performance with cache performs closely to TCM as cache is single access, with cache disable the OCRAM performance is very low compared to TCM.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;As you can see DTCM-NonCache vs DTCM-Cache in my benchmark, there is no performance improvement at all transfering to a cached region.&lt;/P&gt;&lt;P&gt;The question is why cache doesn't improve OCRAM performance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Dec 2024 22:32:04 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2005469#M231438</guid>
      <dc:creator>zixunli</dc:creator>
      <dc:date>2024-12-02T22:32:04Z</dc:date>
    </item>
    <item>
      <title>Re: RT1170: question about memcpy benchmark</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2008592#M231626</link>
      <description>&lt;P&gt;If OCRAM performance is not improving with cache it means that cache is not well implemented. Please make sure that MPU on OCRAM is non-shareable&amp;nbsp; as shareable in i.MXRT means non-cacheable by default.&lt;/P&gt;
&lt;P&gt;Also before using the OCRAM area please perform a clean operation.&lt;/P&gt;
&lt;P&gt;Best regards,&lt;BR /&gt;Omar&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2024 23:24:57 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2008592#M231626</guid>
      <dc:creator>Omar_Anguiano</dc:creator>
      <dc:date>2024-12-05T23:24:57Z</dc:date>
    </item>
    <item>
      <title>Re: RT1170: question about memcpy benchmark</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2008901#M231640</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;P&gt;If OCRAM performance is not improving with cache it means that cache is not well implemented. Please make sure that MPU on OCRAM is non-shareable&amp;nbsp; as shareable in i.MXRT means non-cacheable by default.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;That's &lt;STRONG&gt;exactly&lt;/STRONG&gt; what is done in the example I attached. The MPU is configured by&amp;nbsp;&lt;STRONG&gt;BOARD_ConfigMPU&lt;/STRONG&gt;&lt;STRONG&gt;()&lt;/STRONG&gt; provided by kSDK, as I see the function does configure OCRAM as non-shareable:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;    /* Region 6 setting: Memory with Normal type, not shareable, outer/inner write back */
    MPU-&amp;gt;RBAR = ARM_MPU_RBAR(6, 0x20200000U);
    MPU-&amp;gt;RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_1MB);

...
...

    /* Enable I cache and D cache */
#if defined(__DCACHE_PRESENT) &amp;amp;&amp;amp; __DCACHE_PRESENT
    SCB_EnableDCache();
#endif
#if defined(__ICACHE_PRESENT) &amp;amp;&amp;amp; __ICACHE_PRESENT
    SCB_EnableICache();
#endif&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;Also before using the OCRAM area please perform a clean operation.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;I believe you mean an cache invalidate, it is done in CMSIS function&amp;nbsp;&lt;STRONG&gt;SCB_EnableDCache()&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2024 07:28:36 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/RT1170-question-about-memcpy-benchmark/m-p/2008901#M231640</guid>
      <dc:creator>zixunli</dc:creator>
      <dc:date>2024-12-06T07:28:36Z</dc:date>
    </item>
  </channel>
</rss>

