2368944_en-US

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

2368944_en-US

2368944_en-US

RT1170 EVKB M4 memory speed

I'm doing some benchmarks for M4 core memory access on the RT1170 evkb. Are these numbers expected when the M4 accesses sdram and ocram on the board? It seems extremely slow. 

I have read some information on longer access times for M4 because of the axi fabric etc but this seem very slow. Have I missed something in my configs somewhere making it slower than neccesary? 

The benchmark includes a few different memory sources/destinations but what I'm most interested in improving is edma between sdram and ocram. I'm implementing a multi track audio streaming engine where M4 sets of edma transfers from sdram non cacheable to ocram non cacheable. The M7 core then consumes buffers from shared ocram. 

But the test is run freestanding and there should be none or very little arbitration happeningen. 

Both sdram and the ocram regions are set non cacheable, ocram region is also set sharable in MPUConfig, if I'm not misstaken? 

Audio streamer copy benchmark, Cortex-M4 @ 400 MHz

ops: playback 16 x 4096 B SDRAM_NC->OCRAM, record 8 x 4096 B OCRAM->SDRAM_NC

bytes: playback=65536 record=32768 total=98304

 

CPU memcpy:

playback copies               1532202 cyc 3830.505 us

record copies                  530291 cyc 1325.727 us

scatter total                 2062493 cyc 5156.232 us

single 96K bulk               2301811 cyc 5754.527 us

checksum=239

 

Memory region copy matrix (32768 B per row):

regions src: SDRAM_C=80008000 SDRAM_NC=80818000 SHARED=202B8EC0 NCACHE=20248AA0 OCRAM_CACHE=2026E3C0

MPU/cache: CTRL=0x00000007 TYPE=0x00000800 PSCCR=0x00000003 R4=0x80000004/0x03100033 R5=0x80000005/0x0303002d

EDMA is only run when both endpoints are non-cacheable/DMA-safe.

SDRAM_C->OCRAM_CACHE         bytes=32768 cpu= 1201410 cyc 3003.525 us 10.90 MB/s | edma=skip cacheable checksum=338

SDRAM_NC->OCRAM_CACHE        bytes=32768 cpu= 1037590 cyc 2593.975 us 12.63 MB/s | edma=skip cacheable checksum=602

SDRAM_NC->SHARED_OCRAM       bytes=32768 cpu=  781109 cyc 1952.772 us 16.78 MB/s | edma=ok   779805 cyc 1949.512 us 16.80 MB/s checksum=602/602 | edma+cache=ok   858144 cyc 2145.360 us 15.27 MB/s checksum=602

SDRAM_NC->NCACHE_OCRAM       bytes=32768 cpu=  777897 cyc 1944.742 us 16.84 MB/s | edma=ok   751101 cyc 1877.752 us 17.45 MB/s checksum=602/602 | edma+cache=ok   853273 cyc 2133.182 us 15.36 MB/s checksum=602

OCRAM_CACHE->SDRAM_NC        bytes=32768 cpu=  750396 cyc 1875.990 us 17.46 MB/s | edma=skip cacheable checksum=1204

SHARED_OCRAM->SDRAM_NC       bytes=32768 cpu=  531143 cyc 1327.857 us 24.67 MB/s | edma=ok   570792 cyc 1426.980 us 22.96 MB/s checksum=610/610 | edma+cache=ok   670564 cyc 1676.410 us 19.54 MB/s checksum=610

NCACHE_OCRAM->SDRAM_NC       bytes=32768 cpu=  541568 cyc 1353.920 us 24.20 MB/s | edma=ok   549911 cyc 1374.777 us 23.83 MB/s checksum=940/940 | edma+cache=ok   649852 cyc 1624.630 us 20.16 MB/s checksum=940

SDRAM_NC->SDRAM_NC           bytes=32768 cpu=  896656 cyc 2241.640 us 14.61 MB/s | edma=ok   997686 cyc 2494.215 us 13.13 MB/s checksum=602/602 | edma+cache=ok  1130798 cyc 2826.995 us 11.59 MB/s checksum=602

OCRAM_CACHE->OCRAM_CACHE     bytes=32768 cpu=  768904 cyc 1922.260 us 17.04 MB/s | edma=skip cacheable checksum=1204

SHARED_OCRAM->SHARED_OCRAM   bytes=32768 cpu=  425929 cyc 1064.822 us 30.77 MB/s | edma=ok   433498 cyc 1083.745 us 30.23 MB/s checksum=610/610 | edma+cache=ok   542913 cyc 1357.282 us 24.14 MB/s checksum=610

 

EDMA scatter/gather 24 x 4K, NBYTES=4096:

ops=24 nbytes=4096 ok=1

descriptor setup+start          58942 cyc 147.355 us

wait until done/error         2031292 cyc 5078.230 us

total                         2090234 cyc 5225.585 us

checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=1 biter=1

 

EDMA scatter/gather 24 x 4K, NBYTES=32:

ops=24 nbytes=32 ok=1

descriptor setup+start          47526 cyc 118.815 us

wait until done/error         2055017 cyc 5137.542 us

total                         2102543 cyc 5256.357 us

checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=128 biter=128

 

EDMA single 4K, NBYTES=4096:

ops=1 nbytes=4096 ok=1

descriptor setup+start           5713 cyc 14.282 us

wait until done/error           93592 cyc 233.980 us

total                           99305 cyc 248.262 us

checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=1 biter=1

 

EDMA single 4K, NBYTES=32:

ops=1 nbytes=32 ok=1

descriptor setup+start           5344 cyc 13.360 us

wait until done/error           94609 cyc 236.522 us

total                           99953 cyc 249.882 us

checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=128 biter=128

 

EDMA single 96K, NBYTES=4096:

ops=1 nbytes=4096 ok=1

descriptor setup+start           5867 cyc 14.667 us

wait until done/error         2230360 cyc 5575.900 us

total                         2236227 cyc 5590.567 us

checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=24 biter=24

 

EDMA single 96K, NBYTES=32:

ops=1 nbytes=32 ok=1

descriptor setup+start           5713 cyc 14.282 us

wait until done/error         2256234 cyc 5640.585 us

total                         2261947 cyc 5654.867 us

checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=3072 biter=3072

DMAMUX1_CHCFG[2]=0xa0000000 DMA1_ERQ=0x80 DMA1_ES=0x0

============================================================


My linker:

/* M4/armgcc/MIMXRT1176xxxxx_cm4_flexspi_nor.ld
* M4 XIP-from-FlexSPI with ITCM/DTCM hot sections + SDRAM
* OCRAM partition (1MB @ 0x2024_0000):
* - 128KB DMA-safe / Device/non-cache : 0x20240000..0x2025FFFF
* - 128KB M4 cacheable local scratch : 0x20260000..0x2027FFFF
* - 768KB SHARED (M4<->M7) non-cache : 0x20280000..0x2033FFFF
* - LAST 8KB of SHARED reserved for M7-only : 0x2033E000..0x2033FFFF (NOT in M4 linker region)
*
* Additional fixed carve-out (within shared-for-both, before M7 DMA slice):
* - 36KB fixed SHARED SysView/RTT window for M4 : 0x20335000..0x2033DFFF
* - Shared main (for IPC etc) : 0x20280000..0x20334FFF
*/

ENTRY(Reset_Handler)

HEAP_SIZE = DEFINED(__heap_size__) ? __heap_size__ : 0x0400;
STACK_SIZE = DEFINED(__stack_size__) ? __stack_size__ : 0x0400;

SDRAM_BASE = 0x80000000;
SDRAM_SIZE = 0x04000000; /* 64 MB */
SDRAM_C_SIZE = 0x00800000; /* 8 MB cacheable */
SDRAM_NC_BASE = SDRAM_BASE + SDRAM_C_SIZE; /* 0x80800000 */
SDRAM_NC_SIZE = SDRAM_SIZE - SDRAM_C_SIZE; /* 56 MB */

M4_FLASH_BASE = 0x33600000;
M4_FLASH_SIZE = 0x00A00000;

/* OCRAM: 1MB */
OCRAM_BASE = 0x20240000;
OCRAM_SIZE = 0x00100000;

/* Local OCRAM total = DMA 128KB + cacheable scratch 128KB = 256KB */
OCRAM_LOCAL_TOTAL_SIZE = 0x00040000; /* 256KB */

/* DMA-safe non-cache local window (first 128KB) */
OCRAM_DMA_NC_BASE = 0x20240000;
OCRAM_DMA_NC_SIZE = 0x00020000; /* 128KB */

/* M4 cacheable scratch window (next 128KB) */
OCRAM_LOCAL_C_BASE = 0x20260000;
OCRAM_LOCAL_C_SIZE = 0x00020000; /* 128KB */

/* Shared OCRAM starts at 0x20280000 (rest = 768KB) */
OCRAM_SHARED_BASE = OCRAM_BASE + OCRAM_LOCAL_TOTAL_SIZE; /* 0x20280000 */
OCRAM_SHARED_SIZE_FULL = OCRAM_SIZE - OCRAM_LOCAL_TOTAL_SIZE; /* 0x000C0000 */

/* Reserve LAST 8KB of shared OCRAM for M7-only DMA buffers (exclude from M4 region) */
M7_DMA_OCRAM_BYTES = 0x00002000; /* 8KB */
OCRAM_SHARED_FOR_BOTH_SIZE = OCRAM_SHARED_SIZE_FULL - M7_DMA_OCRAM_BYTES; /* 0x000BE000 */

/* Fixed SysView/RTT carve-out for M4 at end of shared-for-both (36KB) */
SYSVIEW_M4_BYTES = 0x00009000; /* 36KB */

/* Shared main size (everything except the fixed 36KB window) */
OCRAM_SHARED_MAIN_SIZE = OCRAM_SHARED_FOR_BOTH_SIZE - SYSVIEW_M4_BYTES; /* 0x000B5000 */

/* SysView M4 base (fixed) */
OCRAM_SYSVIEW_M4_BASE = OCRAM_SHARED_BASE + OCRAM_SHARED_MAIN_SIZE; /* 0x20335000 */
OCRAM_SYSVIEW_M4_SIZE = SYSVIEW_M4_BYTES; /* 0x00009000 */

MEMORY
{
m_interrupts (RX) : ORIGIN = M4_FLASH_BASE, LENGTH = 0x00000400
m_text (RX) : ORIGIN = M4_FLASH_BASE + 0x400, LENGTH = M4_FLASH_SIZE - 0x00000400

m_itcm (RX) : ORIGIN = 0x1FFE0000, LENGTH = 0x00020000
m_dtcm (RW) : ORIGIN = 0x20000000, LENGTH = 0x00020000

/* OCRAM partition */
m_ncache_local (RW) : ORIGIN = OCRAM_DMA_NC_BASE, LENGTH = OCRAM_DMA_NC_SIZE
m_ocram_cache (RW) : ORIGIN = OCRAM_LOCAL_C_BASE, LENGTH = OCRAM_LOCAL_C_SIZE

m_shared_ocram_main (RW) : ORIGIN = OCRAM_SHARED_BASE, LENGTH = OCRAM_SHARED_MAIN_SIZE
m_shared_sysview_m4 (RW) : ORIGIN = OCRAM_SYSVIEW_M4_BASE, LENGTH = OCRAM_SYSVIEW_M4_SIZE

m_sdram_c (RW) : ORIGIN = SDRAM_BASE, LENGTH = SDRAM_C_SIZE
m_sdram_nc (RW) : ORIGIN = SDRAM_NC_BASE, LENGTH = SDRAM_NC_SIZE
}

SECTIONS
{
__NCACHE_REGION_START = OCRAM_BASE;
__NCACHE_REGION_SIZE = OCRAM_SIZE;

__SHARED_OCRAM_START = ORIGIN(m_shared_ocram_main);
__SHARED_OCRAM_SIZE = LENGTH(m_shared_ocram_main);

__SYSVIEW_M4_BASE = ORIGIN(m_shared_sysview_m4);
__SYSVIEW_M4_SIZE = LENGTH(m_shared_sysview_m4);

__OCRAM_CACHE_START = ORIGIN(m_ocram_cache);
__OCRAM_CACHE_SIZE = LENGTH(m_ocram_cache);

.shared_ocram (NOLOAD) :
{
. = ALIGN(32);
__SHARED_OCRAM_START__ = .;
*(.shared_ocram*)
*(SharedOcram*)
. = ALIGN(32);
__SHARED_OCRAM_END__ = .;
ASSERT((__SHARED_OCRAM_END__ - __SHARED_OCRAM_START__) <= LENGTH(m_shared_ocram_main),
"shared_ocram section too large");
} > m_shared_ocram_main

.shared_sysview_m4 (NOLOAD) :
{
. = ALIGN(32);
__SYSVIEW_M4_START__ = .;
KEEP(*(.shared_sysview_m4))
KEEP(*(.shared_sysview_m4.*))
. = ALIGN(32);
__SYSVIEW_M4_END__ = .;
ASSERT((__SYSVIEW_M4_END__ - __SYSVIEW_M4_START__) <= LENGTH(m_shared_sysview_m4),
"shared_sysview_m4 section too large (max 36KB)");
} > m_shared_sysview_m4

.ocram_cache (NOLOAD) :
{
. = ALIGN(32);
__OCRAM_CACHE_START__ = .;
*(.ocram_cache*)
*(OcramCache*)
. = ALIGN(32);
__OCRAM_CACHE_END__ = .;
ASSERT((__OCRAM_CACHE_END__ - __OCRAM_CACHE_START__) <= LENGTH(m_ocram_cache),
"ocram_cache section too large");
} > m_ocram_cache

.interrupts :
{
__VECTOR_TABLE = .;
__Vectors = .;
. = ALIGN(4);
KEEP(*(.isr_vector))
. = ALIGN(4);
} > m_interrupts

/* ================================================================
* HOT CODE IN ITCM (VMA=ITCM, LMA computed later)
* ================================================================ */
.itcm_text : AT(__itcm_text_load__)
{
. = ALIGN(32);
__itcm_text_start__ = .;

*(.itcm_text*)
*(ITCM_TEXT*)

/* your hot TUs */
*/DspParameterManager.cpp.obj(.text .text.*)
*/audio_param_task.cpp.obj(.text .text.*)
*/StepSequencer.cpp.obj(.text .text.*)
*/sequencer_task.cpp.obj(.text .text.*)

/* MU ISR(s) - by symbol section name */
*(.text.MUB_IRQHandler)

/* FreeRTOS hot bits - by symbol section name */
*(.text.SVC_Handler)
*(.text.PendSV_Handler)
*(.text.SysTick_Handler)

*(.text.vPortEnterCritical)
*(.text.vPortExitCritical)
*(.text.vPortValidateInterruptPriority)
*(.text.xPortStartScheduler)
*(.text.vPortSetupTimerInterrupt)

*(.text.xTaskIncrementTick)
*(.text.vTaskSwitchContext)
*(.text.vTaskGenericNotifyGiveFromISR)
*(.text.xTaskGenericNotify)

*(.text.xQueueGenericSendFromISR)
*(.text.xQueueGiveFromISR)

. = ALIGN(32);
__itcm_text_end__ = .;
} > m_itcm

/* ================================================================
* HOT RODATA IN DTCM (LMA computed later)
* ================================================================ */
.dtcm_rodata : AT(__dtcm_rodata_load__)
{
. = ALIGN(32);
__dtcm_rodata_start__ = .;

*/DspParameterManager.cpp.obj(.rodata .rodata.*)
*/audio_param_task.cpp.obj(.rodata .rodata.*)
*/StepSequencer.cpp.obj(.rodata .rodata.*)
*/sequencer_task.cpp.obj(.rodata .rodata.*)

. = ALIGN(32);
__dtcm_rodata_end__ = .;
} > m_dtcm

/* ================================================================
* HOT DATA IN DTCM (LMA computed later)
* ================================================================ */
.dtcm_data : AT(__dtcm_data_load__)
{
. = ALIGN(32);
__dtcm_data_start__ = .;
*(.dtcm_data*)
*(DTCM_DATA*)
. = ALIGN(32);
__dtcm_data_end__ = .;
} > m_dtcm

/* ================================================================
* FLASH XIP TEXT/RODATA (catch-all)
* ================================================================ */
.text :
{
. = ALIGN(4);

*(.text .text*)
*(.rodata .rodata*)

*(.glue_7)
*(.glue_7t)
*(.eh_frame)
KEEP(*(.init))
KEEP(*(.fini))

. = ALIGN(4);
} > m_text

.ARM.extab : { *(.ARM.extab* .gnu.linkonce.armextab.*) } > m_text

.ARM :
{
__exidx_start = .;
*(.ARM.exidx*)
__exidx_end = .;
} > m_text

.ctors :
{
__CTOR_LIST__ = .;
KEEP (*crtbegin.o(.ctors))
KEEP (*crtbegin?.o(.ctors))
KEEP (*(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors))
KEEP (*(SORT(.ctors.*)))
KEEP (*(.ctors))
__CTOR_END__ = .;
} > m_text

.dtors :
{
__DTOR_LIST__ = .;
KEEP (*crtbegin.o(.dtors))
KEEP (*crtbegin?.o(.dtors))
KEEP (*(EXCLUDE_FILE(*crtend?.o *crtend.o) .dtors))
KEEP (*(SORT(.dtors.*)))
KEEP (*(.dtors))
__DTOR_END__ = .;
} > m_text

.preinit_array :
{
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
} > m_text

.init_array :
{
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
} > m_text

.fini_array :
{
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(SORT(.fini_array.*)))
KEEP (*(.fini_array*))
PROVIDE_HIDDEN (__fini_array_end = .);
} > m_text

/* ================================================================
* CRITICAL FIX:
* Start load-images AFTER ALL flash-resident sections above.
* At this point, '.' is at the real end of flash content.
* ================================================================ */
__DATA_ROM = .;

__itcm_text_load__ = __DATA_ROM;
__dtcm_rodata_load__ = __itcm_text_load__ + SIZEOF(.itcm_text);
__dtcm_data_load__ = __dtcm_rodata_load__ + SIZEOF(.dtcm_rodata);

/* NXP startup expects .data LMA at __etext */
__etext = __dtcm_data_load__ + SIZEOF(.dtcm_data);

.data : AT(__etext)
{
. = ALIGN(4);
__DATA_RAM = .;
__data_start__ = .;
*(.data)
*(.data*)
*(DataQuickAccess)
KEEP(*(.jcr*))
. = ALIGN(4);
__data_end__ = .;
} > m_dtcm

__NDATA_ROM = __etext + (__data_end__ - __data_start__);

.ncache.init : AT(__NDATA_ROM)
{
__noncachedata_start__ = .;
*(NonCacheable.init)
. = ALIGN(4);
__noncachedata_init_end__ = .;
} > m_ncache_local

.ncache (NOLOAD) :
{
*(NonCacheable)
. = ALIGN(4);
__noncachedata_end__ = .;
} > m_ncache_local

.bss (NOLOAD) :
{
. = ALIGN(4);
__bss_start__ = .;
*(.bss)
*(.bss*)
*(COMMON)
. = ALIGN(4);
__bss_end__ = .;
} > m_dtcm

.dtcm_bss (NOLOAD) :
{
. = ALIGN(32);
__dtcm_bss_start__ = .;
*(.dtcm_bss*)
*(DTCM_BSS*)
. = ALIGN(32);
__dtcm_bss_end__ = .;
} > m_dtcm

__SDRAM_C_DATA_ROM = __NDATA_ROM + (__noncachedata_init_end__ - __noncachedata_start__);

.sdram_c_data : AT(__SDRAM_C_DATA_ROM)
{
. = ALIGN(32);
__sdram_c_data_start__ = .;
*(.sdram_c_data*)
*(SDRAM_C_DATA*)
. = ALIGN(32);
__sdram_c_data_end__ = .;
} > m_sdram_c

__sdram_c_data_load__ = __SDRAM_C_DATA_ROM;

.sdram_c_bss (NOLOAD) :
{
. = ALIGN(32);
__sdram_c_bss_start__ = .;
*(.sdram_c_bss*)
*(SDRAM_C_BSS*)
. = ALIGN(32);
__sdram_c_bss_end__ = .;
} > m_sdram_c

.sdram_nc_bss (NOLOAD) :
{
. = ALIGN(32);
__sdram_nc_bss_start__ = .;
*(.sdram_nc_bss*)
*(SDRAM_NC_BSS*)
. = ALIGN(32);
__sdram_nc_bss_end__ = .;
} > m_sdram_nc

.heap :
{
. = ALIGN(8);
__end__ = .;
PROVIDE(end = .);
__HeapBase = .;
. += HEAP_SIZE;
__HeapLimit = .;
__heap_limit = .;
} > m_dtcm

.stack :
{
. = ALIGN(8);
. += STACK_SIZE;
__StackEnd = .;
} > m_dtcm

__StackTop = ORIGIN(m_dtcm) + LENGTH(m_dtcm);
__StackLimit = __StackTop - STACK_SIZE;
PROVIDE(__stack = __StackTop);

.ARM.attributes 0 : { *(.ARM.attributes) }

__SDRAM_BASE = SDRAM_BASE;
__SDRAM_SIZE = SDRAM_SIZE;
__SDRAM_C_BASE = SDRAM_BASE;
__SDRAM_C_SIZE = SDRAM_C_SIZE;
__SDRAM_NC_BASE = SDRAM_NC_BASE;
__SDRAM_NC_SIZE = SDRAM_NC_SIZE;
}


MPUconfig:
/* -------------------------------------------------------------------------- */
/* CM7 MPU config */
/* -------------------------------------------------------------------------- */
#if __CORTEX_M == 7

void BOARD_ConfigMPU(void)
{
#if defined(__ICCARM__) || defined(__GNUC__)
extern uint32_t __NCACHE_REGION_START[];
extern uint32_t __NCACHE_REGION_SIZE[];
uint32_t nonCacheStart = (uint32_t)__NCACHE_REGION_START;
uint32_t size = (uint32_t)__NCACHE_REGION_SIZE;
#else
uint32_t nonCacheStart = OCRAM_BASE_EXPECTED;
uint32_t size = OCRAM_SIZE_EXPECTED;
#endif
(void)nonCacheStart;
(void)size;

#if defined(__ICACHE_PRESENT) && __ICACHE_PRESENT
if (SCB_CCR_IC_Msk == (SCB_CCR_IC_Msk & SCB->CCR))
{
SCB_DisableICache();
}
#endif
#if defined(__DCACHE_PRESENT) && __DCACHE_PRESENT
if (SCB_CCR_DC_Msk == (SCB_CCR_DC_Msk & SCB->CCR))
{
SCB_DisableDCache();
}
#endif

ARM_MPU_Disable();

/* Region 0: deny all (speculative prefetch workaround) */
MPU->RBAR = ARM_MPU_RBAR(0, 0x00000000U);
MPU->RASR = ARM_MPU_RASR(1, ARM_MPU_AP_NONE, 0, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_4GB);

/* Region 1: Device, non-shareable, non-cacheable */
MPU->RBAR = ARM_MPU_RBAR(1, 0x80000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_512MB);

/* Region 2: Device, non-shareable, non-cacheable */
MPU->RBAR = ARM_MPU_RBAR(2, 0x60000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_512MB);

/* Region 3: Device, non-shareable, non-cacheable */
MPU->RBAR = ARM_MPU_RBAR(3, 0x00000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_1GB);

/* Region 4: Normal, WB */
MPU->RBAR = ARM_MPU_RBAR(4, 0x00000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_256KB);

/* Region 5: Normal, WB */
MPU->RBAR = ARM_MPU_RBAR(5, 0x20000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_256KB);

/* OCRAM on CM7: Normal + Shareable + Non-cacheable (TEX=1,S=1,C=0,B=0) */

/* Region 6: 0x20240000..0x2027FFFF (256KB) */
MPU->RBAR = ARM_MPU_RBAR(6, 0x20240000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_256KB);

/* Region 7: 0x20280000..0x202FFFFF (512KB) */
MPU->RBAR = ARM_MPU_RBAR(7, 0x20280000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_512KB);

/* Region 10: 0x20300000..0x2033FFFF (256KB) */
MPU->RBAR = ARM_MPU_RBAR(10, 0x20300000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_256KB);

#if USE_SDRAM
/* Region 11: SDRAM default NON-cacheable 64MB (Normal non-cache) */
MPU->RBAR = ARM_MPU_RBAR(11, 0x80000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_64MB);

#if defined(CACHE_MODE_WRITE_THROUGH) && CACHE_MODE_WRITE_THROUGH
/* Region 12: overlay first 8MB as cacheable (WT) */
MPU->RBAR = ARM_MPU_RBAR(12, 0x80000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 0, 0, ARM_MPU_REGION_SIZE_8MB);
#else
/* Region 12: overlay first 8MB as cacheable (WB) */
MPU->RBAR = ARM_MPU_RBAR(12, 0x80000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_8MB);
#endif
#endif /* USE_SDRAM */

#if defined(XIP_EXTERNAL_FLASH) && (XIP_EXTERNAL_FLASH == 1)
/* Region 8: XIP external flash, RO, cacheable WB, cover full 64MB */
MPU->RBAR = ARM_MPU_RBAR(8, 0x30000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_RO, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_64MB);
#endif

/* Peripheral windows */
MPU->RBAR = ARM_MPU_RBAR(13, 0x40000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_16MB);

MPU->RBAR = ARM_MPU_RBAR(14, 0x41000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_2MB);

MPU->RBAR = ARM_MPU_RBAR(15, 0x41400000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_1MB);

ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_HFNMIENA_Msk);

#if defined(__DCACHE_PRESENT) && __DCACHE_PRESENT
SCB_EnableDCache();
#endif
#if defined(__ICACHE_PRESENT) && __ICACHE_PRESENT
SCB_EnableICache();
#endif
}
#endif /* __CORTEX_M == 7 */

/* -------------------------------------------------------------------------- */
/* CM4 MPU config */
/* -------------------------------------------------------------------------- */
#if __CORTEX_M == 4

void BOARD_ConfigMPU(void)
{
/* ---- Disable code bus cache (LMEM) ---- */
if (LMEM_PCCCR_ENCACHE_MASK == (LMEM_PCCCR_ENCACHE_MASK & LMEM->PCCCR))
{
LMEM->PCCCR |= LMEM_PCCCR_PUSHW0_MASK | LMEM_PCCCR_PUSHW1_MASK | LMEM_PCCCR_GO_MASK;
while ((LMEM->PCCCR & LMEM_PCCCR_GO_MASK) != 0U) {}
LMEM->PCCCR &= ~(LMEM_PCCCR_PUSHW0_MASK | LMEM_PCCCR_PUSHW1_MASK);
LMEM->PCCCR &= ~LMEM_PCCCR_ENCACHE_MASK;
}

/* ---- Disable system bus cache (LMEM) ---- */
if (LMEM_PSCCR_ENCACHE_MASK == (LMEM_PSCCR_ENCACHE_MASK & LMEM->PSCCR))
{
LMEM->PSCCR |= LMEM_PSCCR_PUSHW0_MASK | LMEM_PSCCR_PUSHW1_MASK | LMEM_PSCCR_GO_MASK;
while ((LMEM->PSCCR & LMEM_PSCCR_GO_MASK) != 0U) {}
LMEM->PSCCR &= ~(LMEM_PSCCR_PUSHW0_MASK | LMEM_PSCCR_PUSHW1_MASK);
LMEM->PSCCR &= ~LMEM_PSCCR_ENCACHE_MASK;
}

ARM_MPU_Disable();

/* Region 0: 0x20240000..0x2025FFFF (128KB) DMA -> Device/non-cache.
* Keep the original attributes here: UART/SPI/SD DMA buffers live in this
* window and require the proven non-cache behavior with CM4 LMEM. */
MPU->RBAR = ARM_MPU_RBAR(0, OCRAM_DMA_NC_BASE);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0,
1, /* shareable recommended */
0, /* non-cacheable */
0,
0,
ARM_MPU_REGION_SIZE_128KB);

/* Region 1: 0x20260000..0x2027FFFF (128KB) local scratch -> Normal WB cacheable */
MPU->RBAR = ARM_MPU_RBAR(1, OCRAM_LOCAL_C_BASE);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0, /* Normal */
0, /* not shareable */
1, /* cacheable */
1, /* bufferable (WB) */
0,
ARM_MPU_REGION_SIZE_128KB);

/* Region 2: 0x20280000..0x202FFFFF (512KB) shared -> original non-cache attrs */
MPU->RBAR = ARM_MPU_RBAR(2, 0x20280000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0,
1, /* shareable */
0, /* non-cacheable */
0,
0,
ARM_MPU_REGION_SIZE_512KB);

/* Region 3: 0x20300000..0x2033FFFF (256KB) shared -> original non-cache attrs */
MPU->RBAR = ARM_MPU_RBAR(3, 0x20300000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0,
1, /* shareable */
0, /* non-cacheable */
0,
0,
ARM_MPU_REGION_SIZE_256KB);

#if USE_SDRAM
/* Linker split:
* m_sdram_c = 0x80000000..0x807FFFFF (8MB)
* m_sdram_nc = 0x80800000..0x83FFFFFF (56MB)
*
* The MPU cannot describe 56MB directly, so region 4 makes the full 64MB
* SDRAM window Device/non-cache and region 5 overlays the first 8MB as
* cacheable. Higher-numbered MPU regions take precedence. */
(void)SDRAM_NC_BASE;
(void)SDRAM_NC_SIZE;

/* Region 4: full SDRAM default, 0x80000000..0x83FFFFFF -> Device/non-cache */
MPU->RBAR = ARM_MPU_RBAR(4, SDRAM_BASE_EXPECTED);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
2, /* Device */
0, /* not shareable */
0, /* non-cacheable */
0,
0,
ARM_MPU_REGION_SIZE_64MB);

#if defined(CACHE_MODE_WRITE_THROUGH) && CACHE_MODE_WRITE_THROUGH
/* Region 5: linker m_sdram_c, 0x80000000..0x807FFFFF -> Normal write-through */
MPU->RBAR = ARM_MPU_RBAR(5, SDRAM_C_BASE);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0, /* Normal */
0, /* not shareable */
1, /* cacheable */
0, /* write-through */
0,
ARM_MPU_REGION_SIZE_8MB);
#else
/* Region 5: linker m_sdram_c, 0x80000000..0x807FFFFF -> Normal write-back */
MPU->RBAR = ARM_MPU_RBAR(5, SDRAM_C_BASE);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0, /* Normal */
0, /* not shareable */
1, /* cacheable */
1, /* write-back */
0,
ARM_MPU_REGION_SIZE_8MB);
#endif
#endif /* USE_SDRAM */

ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_HFNMIENA_Msk);

/* Invalidate and enable system bus cache (PSCCR) */
LMEM->PSCCR |= LMEM_PSCCR_INVW0_MASK | LMEM_PSCCR_INVW1_MASK | LMEM_PSCCR_GO_MASK;
while ((LMEM->PSCCR & LMEM_PSCCR_GO_MASK) != 0U) {}
LMEM->PSCCR &= ~(LMEM_PSCCR_INVW0_MASK | LMEM_PSCCR_INVW1_MASK);
LMEM->PSCCR |= LMEM_PSCCR_ENCACHE_MASK;

/* Invalidate and enable code bus cache (PCCCR) */
LMEM->PCCCR |= LMEM_PCCCR_INVW0_MASK | LMEM_PCCCR_INVW1_MASK | LMEM_PCCCR_GO_MASK;
while ((LMEM->PCCCR & LMEM_PCCCR_GO_MASK) != 0U) {}
LMEM->PCCCR &= ~(LMEM_PCCCR_INVW0_MASK | LMEM_PCCCR_INVW1_MASK);
LMEM->PCCCR |= LMEM_PCCCR_ENCACHE_MASK;
}
#endif /* __CORTEX_M == 4 */

I also tried this with similar results but got missmatching checksums when writing from non cacheable ocram to non cacheable sdram using dma: 
#if __CORTEX_M == 7

void BOARD_ConfigMPU(void)
{
#if defined(__ICCARM__) || defined(__GNUC__)
extern uint32_t __NCACHE_REGION_START[];
extern uint32_t __NCACHE_REGION_SIZE[];
uint32_t nonCacheStart = (uint32_t)__NCACHE_REGION_START;
uint32_t size = (uint32_t)__NCACHE_REGION_SIZE;
#else
uint32_t nonCacheStart = OCRAM_BASE_EXPECTED;
uint32_t size = OCRAM_SIZE_EXPECTED;
#endif
(void)nonCacheStart;
(void)size;

#if defined(__ICACHE_PRESENT) && __ICACHE_PRESENT
if (SCB_CCR_IC_Msk == (SCB_CCR_IC_Msk & SCB->CCR))
{
SCB_DisableICache();
}
#endif
#if defined(__DCACHE_PRESENT) && __DCACHE_PRESENT
if (SCB_CCR_DC_Msk == (SCB_CCR_DC_Msk & SCB->CCR))
{
SCB_DisableDCache();
}
#endif

ARM_MPU_Disable();

/* Region 0: deny all (speculative prefetch workaround) */
MPU->RBAR = ARM_MPU_RBAR(0, 0x00000000U);
MPU->RASR = ARM_MPU_RASR(1, ARM_MPU_AP_NONE, 0, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_4GB);

/* Region 1: Device, non-shareable, non-cacheable */
MPU->RBAR = ARM_MPU_RBAR(1, 0x80000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_512MB);

/* Region 2: Device, non-shareable, non-cacheable */
MPU->RBAR = ARM_MPU_RBAR(2, 0x60000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_512MB);

/* Region 3: Device, non-shareable, non-cacheable */
MPU->RBAR = ARM_MPU_RBAR(3, 0x00000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_1GB);

/* Region 4: Normal, WB */
MPU->RBAR = ARM_MPU_RBAR(4, 0x00000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_256KB);

/* Region 5: Normal, WB */
MPU->RBAR = ARM_MPU_RBAR(5, 0x20000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_256KB);

/* OCRAM on CM7: Normal + Shareable + Non-cacheable (TEX=1,S=1,C=0,B=0) */

/* Region 6: 0x20240000..0x2027FFFF (256KB) */
MPU->RBAR = ARM_MPU_RBAR(6, 0x20240000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_256KB);

/* Region 7: 0x20280000..0x202FFFFF (512KB) */
MPU->RBAR = ARM_MPU_RBAR(7, 0x20280000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_512KB);

/* Region 10: 0x20300000..0x2033FFFF (256KB) */
MPU->RBAR = ARM_MPU_RBAR(10, 0x20300000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_256KB);

#if USE_SDRAM
/* Region 11: SDRAM default NON-cacheable 64MB (Normal non-cache) */
MPU->RBAR = ARM_MPU_RBAR(11, 0x80000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_64MB);

#if defined(CACHE_MODE_WRITE_THROUGH) && CACHE_MODE_WRITE_THROUGH
/* Region 12: overlay first 8MB as cacheable (WT) */
MPU->RBAR = ARM_MPU_RBAR(12, 0x80000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 0, 0, ARM_MPU_REGION_SIZE_8MB);
#else
/* Region 12: overlay first 8MB as cacheable (WB) */
MPU->RBAR = ARM_MPU_RBAR(12, 0x80000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_8MB);
#endif
#endif /* USE_SDRAM */

#if defined(XIP_EXTERNAL_FLASH) && (XIP_EXTERNAL_FLASH == 1)
/* Region 8: XIP external flash, RO, cacheable WB, cover full 64MB */
MPU->RBAR = ARM_MPU_RBAR(8, 0x30000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_RO, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_64MB);
#endif

/* Peripheral windows */
MPU->RBAR = ARM_MPU_RBAR(13, 0x40000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_16MB);

MPU->RBAR = ARM_MPU_RBAR(14, 0x41000000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_2MB);

MPU->RBAR = ARM_MPU_RBAR(15, 0x41400000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_1MB);

ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_HFNMIENA_Msk);

#if defined(__DCACHE_PRESENT) && __DCACHE_PRESENT
SCB_EnableDCache();
#endif
#if defined(__ICACHE_PRESENT) && __ICACHE_PRESENT
SCB_EnableICache();
#endif
}
#endif /* __CORTEX_M == 7 */

/* -------------------------------------------------------------------------- */
/* CM4 MPU config */
/* -------------------------------------------------------------------------- */
#if __CORTEX_M == 4

void BOARD_ConfigMPU(void)
{
/* ---- Disable code bus cache (LMEM) ---- */
if (LMEM_PCCCR_ENCACHE_MASK == (LMEM_PCCCR_ENCACHE_MASK & LMEM->PCCCR))
{
LMEM->PCCCR |= LMEM_PCCCR_PUSHW0_MASK | LMEM_PCCCR_PUSHW1_MASK | LMEM_PCCCR_GO_MASK;
while ((LMEM->PCCCR & LMEM_PCCCR_GO_MASK) != 0U) {}
LMEM->PCCCR &= ~(LMEM_PCCCR_PUSHW0_MASK | LMEM_PCCCR_PUSHW1_MASK);
LMEM->PCCCR &= ~LMEM_PCCCR_ENCACHE_MASK;
}

/* ---- Disable system bus cache (LMEM) ---- */
if (LMEM_PSCCR_ENCACHE_MASK == (LMEM_PSCCR_ENCACHE_MASK & LMEM->PSCCR))
{
LMEM->PSCCR |= LMEM_PSCCR_PUSHW0_MASK | LMEM_PSCCR_PUSHW1_MASK | LMEM_PSCCR_GO_MASK;
while ((LMEM->PSCCR & LMEM_PSCCR_GO_MASK) != 0U) {}
LMEM->PSCCR &= ~(LMEM_PSCCR_PUSHW0_MASK | LMEM_PSCCR_PUSHW1_MASK);
LMEM->PSCCR &= ~LMEM_PSCCR_ENCACHE_MASK;
}

ARM_MPU_Disable();

/* Region 0: 0x20240000..0x2025FFFF (128KB) DMA -> Normal non-cacheable */
MPU->RBAR = ARM_MPU_RBAR(0, OCRAM_DMA_NC_BASE);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0, /* Normal */
1, /* shareable recommended */
0, /* non-cacheable */
0,
0,
ARM_MPU_REGION_SIZE_128KB);

/* Region 1: 0x20260000..0x2027FFFF (128KB) local scratch -> Normal WB cacheable */
MPU->RBAR = ARM_MPU_RBAR(1, OCRAM_LOCAL_C_BASE);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0, /* Normal */
0, /* not shareable */
1, /* cacheable */
1, /* bufferable (WB) */
0,
ARM_MPU_REGION_SIZE_128KB);

/* Region 2: 0x20280000..0x202FFFFF (512KB) shared -> Normal (shareable) */
MPU->RBAR = ARM_MPU_RBAR(2, 0x20280000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0, /* Normal */
1, /* shareable */
0, /* non-cacheable */
0,
0,
ARM_MPU_REGION_SIZE_512KB);

/* Region 3: 0x20300000..0x2033FFFF (256KB) shared -> Normal (shareable) */
MPU->RBAR = ARM_MPU_RBAR(3, 0x20300000U);
MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL,
0, /* Normal*/
1, /* shareable */
0, /* non-cacheable */
0,
0,
ARM_MPU_REGION_SIZE_256KB);

ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_HFNMIENA_Msk);

/* Invalidate and enable system bus cache (PSCCR) */
LMEM->PSCCR |= LMEM_PSCCR_INVW0_MASK | LMEM_PSCCR_INVW1_MASK | LMEM_PSCCR_GO_MASK;
while ((LMEM->PSCCR & LMEM_PSCCR_GO_MASK) != 0U) {}
LMEM->PSCCR &= ~(LMEM_PSCCR_INVW0_MASK | LMEM_PSCCR_INVW1_MASK);
LMEM->PSCCR |= LMEM_PSCCR_ENCACHE_MASK;

/* Invalidate and enable code bus cache (PCCCR) */
LMEM->PCCCR |= LMEM_PCCCR_INVW0_MASK | LMEM_PCCCR_INVW1_MASK | LMEM_PCCCR_GO_MASK;
while ((LMEM->PCCCR & LMEM_PCCCR_GO_MASK) != 0U) {}
LMEM->PCCCR &= ~(LMEM_PCCCR_INVW0_MASK | LMEM_PCCCR_INVW1_MASK);
LMEM->PCCCR |= LMEM_PCCCR_ENCACHE_MASK;
}
#endif /* __CORTEX_M == 4 */









Evaluation BoardRe: RT1170 EVKB M4 memory speed

Hi @cyberhelmer ,

Thanks for your interest in NXP MIMXRT series!

NXP has provided test results based on the RT1050. You can view this application note and the corresponding software: https://www.nxp.com/docs/en/application-note/AN12437.pdf

https://www.nxp.com/products/i.MX-RT1050

Gavin_Jia_0-1779342138046.png

I did a quick comparison and found that during the SDRAM write test, ANSW used the following settings: Normal memory property, pure 32-bit sequential writes, and DSB after write.

We recommend that you refer to this ANSW to conduct the corresponding experiment.

Best regards,
Gavin


Tags (1)
No ratings
Version history
Last update:
Thursday
Updated by: