RT1170 EVKB M4 memory speed I'm doing some benchmarks for M4 core memory access on the RT1170 evkb. Are these numbers expected when the M4 accesses sdram and ocram on the board? It seems extremely slow. I have read some information on longer access times for M4 because of the axi fabric etc but this seem very slow. Have I missed something in my configs somewhere making it slower than neccesary? The benchmark includes a few different memory sources/destinations but what I'm most interested in improving is edma between sdram and ocram. I'm implementing a multi track audio streaming engine where M4 sets of edma transfers from sdram non cacheable to ocram non cacheable. The M7 core then consumes buffers from shared ocram. But the test is run freestanding and there should be none or very little arbitration happeningen. Both sdram and the ocram regions are set non cacheable, ocram region is also set sharable in MPUConfig, if I'm not misstaken? Audio streamer copy benchmark, Cortex-M4 @ 400 MHz ops: playback 16 x 4096 B SDRAM_NC->OCRAM, record 8 x 4096 B OCRAM->SDRAM_NC bytes: playback=65536 record=32768 total=98304 CPU memcpy: playback copies 1532202 cyc 3830.505 us record copies 530291 cyc 1325.727 us scatter total 2062493 cyc 5156.232 us single 96K bulk 2301811 cyc 5754.527 us checksum=239 Memory region copy matrix (32768 B per row): regions src: SDRAM_C=80008000 SDRAM_NC=80818000 SHARED=202B8EC0 NCACHE=20248AA0 OCRAM_CACHE=2026E3C0 MPU/cache: CTRL=0x00000007 TYPE=0x00000800 PSCCR=0x00000003 R4=0x80000004/0x03100033 R5=0x80000005/0x0303002d EDMA is only run when both endpoints are non-cacheable/DMA-safe. SDRAM_C->OCRAM_CACHE bytes=32768 cpu= 1201410 cyc 3003.525 us 10.90 MB/s | edma=skip cacheable checksum=338 SDRAM_NC->OCRAM_CACHE bytes=32768 cpu= 1037590 cyc 2593.975 us 12.63 MB/s | edma=skip cacheable checksum=602 SDRAM_NC->SHARED_OCRAM bytes=32768 cpu= 781109 cyc 1952.772 us 16.78 MB/s | edma=ok 779805 cyc 1949.512 us 16.80 MB/s checksum=602/602 | edma+cache=ok 858144 cyc 2145.360 us 15.27 MB/s checksum=602 SDRAM_NC->NCACHE_OCRAM bytes=32768 cpu= 777897 cyc 1944.742 us 16.84 MB/s | edma=ok 751101 cyc 1877.752 us 17.45 MB/s checksum=602/602 | edma+cache=ok 853273 cyc 2133.182 us 15.36 MB/s checksum=602 OCRAM_CACHE->SDRAM_NC bytes=32768 cpu= 750396 cyc 1875.990 us 17.46 MB/s | edma=skip cacheable checksum=1204 SHARED_OCRAM->SDRAM_NC bytes=32768 cpu= 531143 cyc 1327.857 us 24.67 MB/s | edma=ok 570792 cyc 1426.980 us 22.96 MB/s checksum=610/610 | edma+cache=ok 670564 cyc 1676.410 us 19.54 MB/s checksum=610 NCACHE_OCRAM->SDRAM_NC bytes=32768 cpu= 541568 cyc 1353.920 us 24.20 MB/s | edma=ok 549911 cyc 1374.777 us 23.83 MB/s checksum=940/940 | edma+cache=ok 649852 cyc 1624.630 us 20.16 MB/s checksum=940 SDRAM_NC->SDRAM_NC bytes=32768 cpu= 896656 cyc 2241.640 us 14.61 MB/s | edma=ok 997686 cyc 2494.215 us 13.13 MB/s checksum=602/602 | edma+cache=ok 1130798 cyc 2826.995 us 11.59 MB/s checksum=602 OCRAM_CACHE->OCRAM_CACHE bytes=32768 cpu= 768904 cyc 1922.260 us 17.04 MB/s | edma=skip cacheable checksum=1204 SHARED_OCRAM->SHARED_OCRAM bytes=32768 cpu= 425929 cyc 1064.822 us 30.77 MB/s | edma=ok 433498 cyc 1083.745 us 30.23 MB/s checksum=610/610 | edma+cache=ok 542913 cyc 1357.282 us 24.14 MB/s checksum=610 EDMA scatter/gather 24 x 4K, NBYTES=4096: ops=24 nbytes=4096 ok=1 descriptor setup+start 58942 cyc 147.355 us wait until done/error 2031292 cyc 5078.230 us total 2090234 cyc 5225.585 us checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=1 biter=1 EDMA scatter/gather 24 x 4K, NBYTES=32: ops=24 nbytes=32 ok=1 descriptor setup+start 47526 cyc 118.815 us wait until done/error 2055017 cyc 5137.542 us total 2102543 cyc 5256.357 us checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=128 biter=128 EDMA single 4K, NBYTES=4096: ops=1 nbytes=4096 ok=1 descriptor setup+start 5713 cyc 14.282 us wait until done/error 93592 cyc 233.980 us total 99305 cyc 248.262 us checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=1 biter=1 EDMA single 4K, NBYTES=32: ops=1 nbytes=32 ok=1 descriptor setup+start 5344 cyc 13.360 us wait until done/error 94609 cyc 236.522 us total 99953 cyc 249.882 us checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=128 biter=128 EDMA single 96K, NBYTES=4096: ops=1 nbytes=4096 ok=1 descriptor setup+start 5867 cyc 14.667 us wait until done/error 2230360 cyc 5575.900 us total 2236227 cyc 5590.567 us checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=24 biter=24 EDMA single 96K, NBYTES=32: ops=1 nbytes=32 ok=1 descriptor setup+start 5713 cyc 14.282 us wait until done/error 2256234 cyc 5640.585 us total 2261947 cyc 5654.867 us checksum=239 channelFlags=0x1 errorFlags=0x0 remaining=0 csr=0x0088 citer=3072 biter=3072 DMAMUX1_CHCFG[2]=0xa0000000 DMA1_ERQ=0x80 DMA1_ES=0x0 ============================================================ My linker: /* M4/armgcc/MIMXRT1176xxxxx_cm4_flexspi_nor.ld * M4 XIP-from-FlexSPI with ITCM/DTCM hot sections + SDRAM * OCRAM partition (1MB @ 0x2024_0000): * - 128KB DMA-safe / Device/non-cache : 0x20240000..0x2025FFFF * - 128KB M4 cacheable local scratch : 0x20260000..0x2027FFFF * - 768KB SHARED (M4<->M7) non-cache : 0x20280000..0x2033FFFF * - LAST 8KB of SHARED reserved for M7-only : 0x2033E000..0x2033FFFF (NOT in M4 linker region) * * Additional fixed carve-out (within shared-for-both, before M7 DMA slice): * - 36KB fixed SHARED SysView/RTT window for M4 : 0x20335000..0x2033DFFF * - Shared main (for IPC etc) : 0x20280000..0x20334FFF */ ENTRY(Reset_Handler) HEAP_SIZE = DEFINED(__heap_size__) ? __heap_size__ : 0x0400; STACK_SIZE = DEFINED(__stack_size__) ? __stack_size__ : 0x0400; SDRAM_BASE = 0x80000000; SDRAM_SIZE = 0x04000000; /* 64 MB */ SDRAM_C_SIZE = 0x00800000; /* 8 MB cacheable */ SDRAM_NC_BASE = SDRAM_BASE + SDRAM_C_SIZE; /* 0x80800000 */ SDRAM_NC_SIZE = SDRAM_SIZE - SDRAM_C_SIZE; /* 56 MB */ M4_FLASH_BASE = 0x33600000; M4_FLASH_SIZE = 0x00A00000; /* OCRAM: 1MB */ OCRAM_BASE = 0x20240000; OCRAM_SIZE = 0x00100000; /* Local OCRAM total = DMA 128KB + cacheable scratch 128KB = 256KB */ OCRAM_LOCAL_TOTAL_SIZE = 0x00040000; /* 256KB */ /* DMA-safe non-cache local window (first 128KB) */ OCRAM_DMA_NC_BASE = 0x20240000; OCRAM_DMA_NC_SIZE = 0x00020000; /* 128KB */ /* M4 cacheable scratch window (next 128KB) */ OCRAM_LOCAL_C_BASE = 0x20260000; OCRAM_LOCAL_C_SIZE = 0x00020000; /* 128KB */ /* Shared OCRAM starts at 0x20280000 (rest = 768KB) */ OCRAM_SHARED_BASE = OCRAM_BASE + OCRAM_LOCAL_TOTAL_SIZE; /* 0x20280000 */ OCRAM_SHARED_SIZE_FULL = OCRAM_SIZE - OCRAM_LOCAL_TOTAL_SIZE; /* 0x000C0000 */ /* Reserve LAST 8KB of shared OCRAM for M7-only DMA buffers (exclude from M4 region) */ M7_DMA_OCRAM_BYTES = 0x00002000; /* 8KB */ OCRAM_SHARED_FOR_BOTH_SIZE = OCRAM_SHARED_SIZE_FULL - M7_DMA_OCRAM_BYTES; /* 0x000BE000 */ /* Fixed SysView/RTT carve-out for M4 at end of shared-for-both (36KB) */ SYSVIEW_M4_BYTES = 0x00009000; /* 36KB */ /* Shared main size (everything except the fixed 36KB window) */ OCRAM_SHARED_MAIN_SIZE = OCRAM_SHARED_FOR_BOTH_SIZE - SYSVIEW_M4_BYTES; /* 0x000B5000 */ /* SysView M4 base (fixed) */ OCRAM_SYSVIEW_M4_BASE = OCRAM_SHARED_BASE + OCRAM_SHARED_MAIN_SIZE; /* 0x20335000 */ OCRAM_SYSVIEW_M4_SIZE = SYSVIEW_M4_BYTES; /* 0x00009000 */ MEMORY { m_interrupts (RX) : ORIGIN = M4_FLASH_BASE, LENGTH = 0x00000400 m_text (RX) : ORIGIN = M4_FLASH_BASE + 0x400, LENGTH = M4_FLASH_SIZE - 0x00000400 m_itcm (RX) : ORIGIN = 0x1FFE0000, LENGTH = 0x00020000 m_dtcm (RW) : ORIGIN = 0x20000000, LENGTH = 0x00020000 /* OCRAM partition */ m_ncache_local (RW) : ORIGIN = OCRAM_DMA_NC_BASE, LENGTH = OCRAM_DMA_NC_SIZE m_ocram_cache (RW) : ORIGIN = OCRAM_LOCAL_C_BASE, LENGTH = OCRAM_LOCAL_C_SIZE m_shared_ocram_main (RW) : ORIGIN = OCRAM_SHARED_BASE, LENGTH = OCRAM_SHARED_MAIN_SIZE m_shared_sysview_m4 (RW) : ORIGIN = OCRAM_SYSVIEW_M4_BASE, LENGTH = OCRAM_SYSVIEW_M4_SIZE m_sdram_c (RW) : ORIGIN = SDRAM_BASE, LENGTH = SDRAM_C_SIZE m_sdram_nc (RW) : ORIGIN = SDRAM_NC_BASE, LENGTH = SDRAM_NC_SIZE } SECTIONS { __NCACHE_REGION_START = OCRAM_BASE; __NCACHE_REGION_SIZE = OCRAM_SIZE; __SHARED_OCRAM_START = ORIGIN(m_shared_ocram_main); __SHARED_OCRAM_SIZE = LENGTH(m_shared_ocram_main); __SYSVIEW_M4_BASE = ORIGIN(m_shared_sysview_m4); __SYSVIEW_M4_SIZE = LENGTH(m_shared_sysview_m4); __OCRAM_CACHE_START = ORIGIN(m_ocram_cache); __OCRAM_CACHE_SIZE = LENGTH(m_ocram_cache); .shared_ocram (NOLOAD) : { . = ALIGN(32); __SHARED_OCRAM_START__ = .; *(.shared_ocram*) *(SharedOcram*) . = ALIGN(32); __SHARED_OCRAM_END__ = .; ASSERT((__SHARED_OCRAM_END__ - __SHARED_OCRAM_START__) <= LENGTH(m_shared_ocram_main), "shared_ocram section too large"); } > m_shared_ocram_main .shared_sysview_m4 (NOLOAD) : { . = ALIGN(32); __SYSVIEW_M4_START__ = .; KEEP(*(.shared_sysview_m4)) KEEP(*(.shared_sysview_m4.*)) . = ALIGN(32); __SYSVIEW_M4_END__ = .; ASSERT((__SYSVIEW_M4_END__ - __SYSVIEW_M4_START__) <= LENGTH(m_shared_sysview_m4), "shared_sysview_m4 section too large (max 36KB)"); } > m_shared_sysview_m4 .ocram_cache (NOLOAD) : { . = ALIGN(32); __OCRAM_CACHE_START__ = .; *(.ocram_cache*) *(OcramCache*) . = ALIGN(32); __OCRAM_CACHE_END__ = .; ASSERT((__OCRAM_CACHE_END__ - __OCRAM_CACHE_START__) <= LENGTH(m_ocram_cache), "ocram_cache section too large"); } > m_ocram_cache .interrupts : { __VECTOR_TABLE = .; __Vectors = .; . = ALIGN(4); KEEP(*(.isr_vector)) . = ALIGN(4); } > m_interrupts /* ================================================================ * HOT CODE IN ITCM (VMA=ITCM, LMA computed later) * ================================================================ */ .itcm_text : AT(__itcm_text_load__) { . = ALIGN(32); __itcm_text_start__ = .; *(.itcm_text*) *(ITCM_TEXT*) /* your hot TUs */ */DspParameterManager.cpp.obj(.text .text.*) */audio_param_task.cpp.obj(.text .text.*) */StepSequencer.cpp.obj(.text .text.*) */sequencer_task.cpp.obj(.text .text.*) /* MU ISR(s) - by symbol section name */ *(.text.MUB_IRQHandler) /* FreeRTOS hot bits - by symbol section name */ *(.text.SVC_Handler) *(.text.PendSV_Handler) *(.text.SysTick_Handler) *(.text.vPortEnterCritical) *(.text.vPortExitCritical) *(.text.vPortValidateInterruptPriority) *(.text.xPortStartScheduler) *(.text.vPortSetupTimerInterrupt) *(.text.xTaskIncrementTick) *(.text.vTaskSwitchContext) *(.text.vTaskGenericNotifyGiveFromISR) *(.text.xTaskGenericNotify) *(.text.xQueueGenericSendFromISR) *(.text.xQueueGiveFromISR) . = ALIGN(32); __itcm_text_end__ = .; } > m_itcm /* ================================================================ * HOT RODATA IN DTCM (LMA computed later) * ================================================================ */ .dtcm_rodata : AT(__dtcm_rodata_load__) { . = ALIGN(32); __dtcm_rodata_start__ = .; */DspParameterManager.cpp.obj(.rodata .rodata.*) */audio_param_task.cpp.obj(.rodata .rodata.*) */StepSequencer.cpp.obj(.rodata .rodata.*) */sequencer_task.cpp.obj(.rodata .rodata.*) . = ALIGN(32); __dtcm_rodata_end__ = .; } > m_dtcm /* ================================================================ * HOT DATA IN DTCM (LMA computed later) * ================================================================ */ .dtcm_data : AT(__dtcm_data_load__) { . = ALIGN(32); __dtcm_data_start__ = .; *(.dtcm_data*) *(DTCM_DATA*) . = ALIGN(32); __dtcm_data_end__ = .; } > m_dtcm /* ================================================================ * FLASH XIP TEXT/RODATA (catch-all) * ================================================================ */ .text : { . = ALIGN(4); *(.text .text*) *(.rodata .rodata*) *(.glue_7) *(.glue_7t) *(.eh_frame) KEEP(*(.init)) KEEP(*(.fini)) . = ALIGN(4); } > m_text .ARM.extab : { *(.ARM.extab* .gnu.linkonce.armextab.*) } > m_text .ARM : { __exidx_start = .; *(.ARM.exidx*) __exidx_end = .; } > m_text .ctors : { __CTOR_LIST__ = .; KEEP (*crtbegin.o(.ctors)) KEEP (*crtbegin?.o(.ctors)) KEEP (*(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors)) KEEP (*(SORT(.ctors.*))) KEEP (*(.ctors)) __CTOR_END__ = .; } > m_text .dtors : { __DTOR_LIST__ = .; KEEP (*crtbegin.o(.dtors)) KEEP (*crtbegin?.o(.dtors)) KEEP (*(EXCLUDE_FILE(*crtend?.o *crtend.o) .dtors)) KEEP (*(SORT(.dtors.*))) KEEP (*(.dtors)) __DTOR_END__ = .; } > m_text .preinit_array : { PROVIDE_HIDDEN (__preinit_array_start = .); KEEP (*(.preinit_array*)) PROVIDE_HIDDEN (__preinit_array_end = .); } > m_text .init_array : { PROVIDE_HIDDEN (__init_array_start = .); KEEP (*(SORT(.init_array.*))) KEEP (*(.init_array*)) PROVIDE_HIDDEN (__init_array_end = .); } > m_text .fini_array : { PROVIDE_HIDDEN (__fini_array_start = .); KEEP (*(SORT(.fini_array.*))) KEEP (*(.fini_array*)) PROVIDE_HIDDEN (__fini_array_end = .); } > m_text /* ================================================================ * CRITICAL FIX: * Start load-images AFTER ALL flash-resident sections above. * At this point, '.' is at the real end of flash content. * ================================================================ */ __DATA_ROM = .; __itcm_text_load__ = __DATA_ROM; __dtcm_rodata_load__ = __itcm_text_load__ + SIZEOF(.itcm_text); __dtcm_data_load__ = __dtcm_rodata_load__ + SIZEOF(.dtcm_rodata); /* NXP startup expects .data LMA at __etext */ __etext = __dtcm_data_load__ + SIZEOF(.dtcm_data); .data : AT(__etext) { . = ALIGN(4); __DATA_RAM = .; __data_start__ = .; *(.data) *(.data*) *(DataQuickAccess) KEEP(*(.jcr*)) . = ALIGN(4); __data_end__ = .; } > m_dtcm __NDATA_ROM = __etext + (__data_end__ - __data_start__); .ncache.init : AT(__NDATA_ROM) { __noncachedata_start__ = .; *(NonCacheable.init) . = ALIGN(4); __noncachedata_init_end__ = .; } > m_ncache_local .ncache (NOLOAD) : { *(NonCacheable) . = ALIGN(4); __noncachedata_end__ = .; } > m_ncache_local .bss (NOLOAD) : { . = ALIGN(4); __bss_start__ = .; *(.bss) *(.bss*) *(COMMON) . = ALIGN(4); __bss_end__ = .; } > m_dtcm .dtcm_bss (NOLOAD) : { . = ALIGN(32); __dtcm_bss_start__ = .; *(.dtcm_bss*) *(DTCM_BSS*) . = ALIGN(32); __dtcm_bss_end__ = .; } > m_dtcm __SDRAM_C_DATA_ROM = __NDATA_ROM + (__noncachedata_init_end__ - __noncachedata_start__); .sdram_c_data : AT(__SDRAM_C_DATA_ROM) { . = ALIGN(32); __sdram_c_data_start__ = .; *(.sdram_c_data*) *(SDRAM_C_DATA*) . = ALIGN(32); __sdram_c_data_end__ = .; } > m_sdram_c __sdram_c_data_load__ = __SDRAM_C_DATA_ROM; .sdram_c_bss (NOLOAD) : { . = ALIGN(32); __sdram_c_bss_start__ = .; *(.sdram_c_bss*) *(SDRAM_C_BSS*) . = ALIGN(32); __sdram_c_bss_end__ = .; } > m_sdram_c .sdram_nc_bss (NOLOAD) : { . = ALIGN(32); __sdram_nc_bss_start__ = .; *(.sdram_nc_bss*) *(SDRAM_NC_BSS*) . = ALIGN(32); __sdram_nc_bss_end__ = .; } > m_sdram_nc .heap : { . = ALIGN(8); __end__ = .; PROVIDE(end = .); __HeapBase = .; . += HEAP_SIZE; __HeapLimit = .; __heap_limit = .; } > m_dtcm .stack : { . = ALIGN(8); . += STACK_SIZE; __StackEnd = .; } > m_dtcm __StackTop = ORIGIN(m_dtcm) + LENGTH(m_dtcm); __StackLimit = __StackTop - STACK_SIZE; PROVIDE(__stack = __StackTop); .ARM.attributes 0 : { *(.ARM.attributes) } __SDRAM_BASE = SDRAM_BASE; __SDRAM_SIZE = SDRAM_SIZE; __SDRAM_C_BASE = SDRAM_BASE; __SDRAM_C_SIZE = SDRAM_C_SIZE; __SDRAM_NC_BASE = SDRAM_NC_BASE; __SDRAM_NC_SIZE = SDRAM_NC_SIZE; } MPUconfig: /* -------------------------------------------------------------------------- */ /* CM7 MPU config */ /* -------------------------------------------------------------------------- */ #if __CORTEX_M == 7 void BOARD_ConfigMPU(void) { #if defined(__ICCARM__) || defined(__GNUC__) extern uint32_t __NCACHE_REGION_START[]; extern uint32_t __NCACHE_REGION_SIZE[]; uint32_t nonCacheStart = (uint32_t)__NCACHE_REGION_START; uint32_t size = (uint32_t)__NCACHE_REGION_SIZE; #else uint32_t nonCacheStart = OCRAM_BASE_EXPECTED; uint32_t size = OCRAM_SIZE_EXPECTED; #endif (void)nonCacheStart; (void)size; #if defined(__ICACHE_PRESENT) && __ICACHE_PRESENT if (SCB_CCR_IC_Msk == (SCB_CCR_IC_Msk & SCB->CCR)) { SCB_DisableICache(); } #endif #if defined(__DCACHE_PRESENT) && __DCACHE_PRESENT if (SCB_CCR_DC_Msk == (SCB_CCR_DC_Msk & SCB->CCR)) { SCB_DisableDCache(); } #endif ARM_MPU_Disable(); /* Region 0: deny all (speculative prefetch workaround) */ MPU->RBAR = ARM_MPU_RBAR(0, 0x00000000U); MPU->RASR = ARM_MPU_RASR(1, ARM_MPU_AP_NONE, 0, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_4GB); /* Region 1: Device, non-shareable, non-cacheable */ MPU->RBAR = ARM_MPU_RBAR(1, 0x80000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_512MB); /* Region 2: Device, non-shareable, non-cacheable */ MPU->RBAR = ARM_MPU_RBAR(2, 0x60000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_512MB); /* Region 3: Device, non-shareable, non-cacheable */ MPU->RBAR = ARM_MPU_RBAR(3, 0x00000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_1GB); /* Region 4: Normal, WB */ MPU->RBAR = ARM_MPU_RBAR(4, 0x00000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_256KB); /* Region 5: Normal, WB */ MPU->RBAR = ARM_MPU_RBAR(5, 0x20000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_256KB); /* OCRAM on CM7: Normal + Shareable + Non-cacheable (TEX=1,S=1,C=0,B=0) */ /* Region 6: 0x20240000..0x2027FFFF (256KB) */ MPU->RBAR = ARM_MPU_RBAR(6, 0x20240000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_256KB); /* Region 7: 0x20280000..0x202FFFFF (512KB) */ MPU->RBAR = ARM_MPU_RBAR(7, 0x20280000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_512KB); /* Region 10: 0x20300000..0x2033FFFF (256KB) */ MPU->RBAR = ARM_MPU_RBAR(10, 0x20300000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_256KB); #if USE_SDRAM /* Region 11: SDRAM default NON-cacheable 64MB (Normal non-cache) */ MPU->RBAR = ARM_MPU_RBAR(11, 0x80000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_64MB); #if defined(CACHE_MODE_WRITE_THROUGH) && CACHE_MODE_WRITE_THROUGH /* Region 12: overlay first 8MB as cacheable (WT) */ MPU->RBAR = ARM_MPU_RBAR(12, 0x80000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 0, 0, ARM_MPU_REGION_SIZE_8MB); #else /* Region 12: overlay first 8MB as cacheable (WB) */ MPU->RBAR = ARM_MPU_RBAR(12, 0x80000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_8MB); #endif #endif /* USE_SDRAM */ #if defined(XIP_EXTERNAL_FLASH) && (XIP_EXTERNAL_FLASH == 1) /* Region 8: XIP external flash, RO, cacheable WB, cover full 64MB */ MPU->RBAR = ARM_MPU_RBAR(8, 0x30000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_RO, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_64MB); #endif /* Peripheral windows */ MPU->RBAR = ARM_MPU_RBAR(13, 0x40000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_16MB); MPU->RBAR = ARM_MPU_RBAR(14, 0x41000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_2MB); MPU->RBAR = ARM_MPU_RBAR(15, 0x41400000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_1MB); ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_HFNMIENA_Msk); #if defined(__DCACHE_PRESENT) && __DCACHE_PRESENT SCB_EnableDCache(); #endif #if defined(__ICACHE_PRESENT) && __ICACHE_PRESENT SCB_EnableICache(); #endif } #endif /* __CORTEX_M == 7 */ /* -------------------------------------------------------------------------- */ /* CM4 MPU config */ /* -------------------------------------------------------------------------- */ #if __CORTEX_M == 4 void BOARD_ConfigMPU(void) { /* ---- Disable code bus cache (LMEM) ---- */ if (LMEM_PCCCR_ENCACHE_MASK == (LMEM_PCCCR_ENCACHE_MASK & LMEM->PCCCR)) { LMEM->PCCCR |= LMEM_PCCCR_PUSHW0_MASK | LMEM_PCCCR_PUSHW1_MASK | LMEM_PCCCR_GO_MASK; while ((LMEM->PCCCR & LMEM_PCCCR_GO_MASK) != 0U) {} LMEM->PCCCR &= ~(LMEM_PCCCR_PUSHW0_MASK | LMEM_PCCCR_PUSHW1_MASK); LMEM->PCCCR &= ~LMEM_PCCCR_ENCACHE_MASK; } /* ---- Disable system bus cache (LMEM) ---- */ if (LMEM_PSCCR_ENCACHE_MASK == (LMEM_PSCCR_ENCACHE_MASK & LMEM->PSCCR)) { LMEM->PSCCR |= LMEM_PSCCR_PUSHW0_MASK | LMEM_PSCCR_PUSHW1_MASK | LMEM_PSCCR_GO_MASK; while ((LMEM->PSCCR & LMEM_PSCCR_GO_MASK) != 0U) {} LMEM->PSCCR &= ~(LMEM_PSCCR_PUSHW0_MASK | LMEM_PSCCR_PUSHW1_MASK); LMEM->PSCCR &= ~LMEM_PSCCR_ENCACHE_MASK; } ARM_MPU_Disable(); /* Region 0: 0x20240000..0x2025FFFF (128KB) DMA -> Device/non-cache. * Keep the original attributes here: UART/SPI/SD DMA buffers live in this * window and require the proven non-cache behavior with CM4 LMEM. */ MPU->RBAR = ARM_MPU_RBAR(0, OCRAM_DMA_NC_BASE); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 1, /* shareable recommended */ 0, /* non-cacheable */ 0, 0, ARM_MPU_REGION_SIZE_128KB); /* Region 1: 0x20260000..0x2027FFFF (128KB) local scratch -> Normal WB cacheable */ MPU->RBAR = ARM_MPU_RBAR(1, OCRAM_LOCAL_C_BASE); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, /* Normal */ 0, /* not shareable */ 1, /* cacheable */ 1, /* bufferable (WB) */ 0, ARM_MPU_REGION_SIZE_128KB); /* Region 2: 0x20280000..0x202FFFFF (512KB) shared -> original non-cache attrs */ MPU->RBAR = ARM_MPU_RBAR(2, 0x20280000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 1, /* shareable */ 0, /* non-cacheable */ 0, 0, ARM_MPU_REGION_SIZE_512KB); /* Region 3: 0x20300000..0x2033FFFF (256KB) shared -> original non-cache attrs */ MPU->RBAR = ARM_MPU_RBAR(3, 0x20300000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 1, /* shareable */ 0, /* non-cacheable */ 0, 0, ARM_MPU_REGION_SIZE_256KB); #if USE_SDRAM /* Linker split: * m_sdram_c = 0x80000000..0x807FFFFF (8MB) * m_sdram_nc = 0x80800000..0x83FFFFFF (56MB) * * The MPU cannot describe 56MB directly, so region 4 makes the full 64MB * SDRAM window Device/non-cache and region 5 overlays the first 8MB as * cacheable. Higher-numbered MPU regions take precedence. */ (void)SDRAM_NC_BASE; (void)SDRAM_NC_SIZE; /* Region 4: full SDRAM default, 0x80000000..0x83FFFFFF -> Device/non-cache */ MPU->RBAR = ARM_MPU_RBAR(4, SDRAM_BASE_EXPECTED); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, /* Device */ 0, /* not shareable */ 0, /* non-cacheable */ 0, 0, ARM_MPU_REGION_SIZE_64MB); #if defined(CACHE_MODE_WRITE_THROUGH) && CACHE_MODE_WRITE_THROUGH /* Region 5: linker m_sdram_c, 0x80000000..0x807FFFFF -> Normal write-through */ MPU->RBAR = ARM_MPU_RBAR(5, SDRAM_C_BASE); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, /* Normal */ 0, /* not shareable */ 1, /* cacheable */ 0, /* write-through */ 0, ARM_MPU_REGION_SIZE_8MB); #else /* Region 5: linker m_sdram_c, 0x80000000..0x807FFFFF -> Normal write-back */ MPU->RBAR = ARM_MPU_RBAR(5, SDRAM_C_BASE); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, /* Normal */ 0, /* not shareable */ 1, /* cacheable */ 1, /* write-back */ 0, ARM_MPU_REGION_SIZE_8MB); #endif #endif /* USE_SDRAM */ ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_HFNMIENA_Msk); /* Invalidate and enable system bus cache (PSCCR) */ LMEM->PSCCR |= LMEM_PSCCR_INVW0_MASK | LMEM_PSCCR_INVW1_MASK | LMEM_PSCCR_GO_MASK; while ((LMEM->PSCCR & LMEM_PSCCR_GO_MASK) != 0U) {} LMEM->PSCCR &= ~(LMEM_PSCCR_INVW0_MASK | LMEM_PSCCR_INVW1_MASK); LMEM->PSCCR |= LMEM_PSCCR_ENCACHE_MASK; /* Invalidate and enable code bus cache (PCCCR) */ LMEM->PCCCR |= LMEM_PCCCR_INVW0_MASK | LMEM_PCCCR_INVW1_MASK | LMEM_PCCCR_GO_MASK; while ((LMEM->PCCCR & LMEM_PCCCR_GO_MASK) != 0U) {} LMEM->PCCCR &= ~(LMEM_PCCCR_INVW0_MASK | LMEM_PCCCR_INVW1_MASK); LMEM->PCCCR |= LMEM_PCCCR_ENCACHE_MASK; } #endif /* __CORTEX_M == 4 */ I also tried this with similar results but got missmatching checksums when writing from non cacheable ocram to non cacheable sdram using dma: #if __CORTEX_M == 7 void BOARD_ConfigMPU(void) { #if defined(__ICCARM__) || defined(__GNUC__) extern uint32_t __NCACHE_REGION_START[]; extern uint32_t __NCACHE_REGION_SIZE[]; uint32_t nonCacheStart = (uint32_t)__NCACHE_REGION_START; uint32_t size = (uint32_t)__NCACHE_REGION_SIZE; #else uint32_t nonCacheStart = OCRAM_BASE_EXPECTED; uint32_t size = OCRAM_SIZE_EXPECTED; #endif (void)nonCacheStart; (void)size; #if defined(__ICACHE_PRESENT) && __ICACHE_PRESENT if (SCB_CCR_IC_Msk == (SCB_CCR_IC_Msk & SCB->CCR)) { SCB_DisableICache(); } #endif #if defined(__DCACHE_PRESENT) && __DCACHE_PRESENT if (SCB_CCR_DC_Msk == (SCB_CCR_DC_Msk & SCB->CCR)) { SCB_DisableDCache(); } #endif ARM_MPU_Disable(); /* Region 0: deny all (speculative prefetch workaround) */ MPU->RBAR = ARM_MPU_RBAR(0, 0x00000000U); MPU->RASR = ARM_MPU_RASR(1, ARM_MPU_AP_NONE, 0, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_4GB); /* Region 1: Device, non-shareable, non-cacheable */ MPU->RBAR = ARM_MPU_RBAR(1, 0x80000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_512MB); /* Region 2: Device, non-shareable, non-cacheable */ MPU->RBAR = ARM_MPU_RBAR(2, 0x60000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_512MB); /* Region 3: Device, non-shareable, non-cacheable */ MPU->RBAR = ARM_MPU_RBAR(3, 0x00000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_1GB); /* Region 4: Normal, WB */ MPU->RBAR = ARM_MPU_RBAR(4, 0x00000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_256KB); /* Region 5: Normal, WB */ MPU->RBAR = ARM_MPU_RBAR(5, 0x20000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_256KB); /* OCRAM on CM7: Normal + Shareable + Non-cacheable (TEX=1,S=1,C=0,B=0) */ /* Region 6: 0x20240000..0x2027FFFF (256KB) */ MPU->RBAR = ARM_MPU_RBAR(6, 0x20240000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_256KB); /* Region 7: 0x20280000..0x202FFFFF (512KB) */ MPU->RBAR = ARM_MPU_RBAR(7, 0x20280000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_512KB); /* Region 10: 0x20300000..0x2033FFFF (256KB) */ MPU->RBAR = ARM_MPU_RBAR(10, 0x20300000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 1, 0, 0, 0, ARM_MPU_REGION_SIZE_256KB); #if USE_SDRAM /* Region 11: SDRAM default NON-cacheable 64MB (Normal non-cache) */ MPU->RBAR = ARM_MPU_RBAR(11, 0x80000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_64MB); #if defined(CACHE_MODE_WRITE_THROUGH) && CACHE_MODE_WRITE_THROUGH /* Region 12: overlay first 8MB as cacheable (WT) */ MPU->RBAR = ARM_MPU_RBAR(12, 0x80000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 0, 0, ARM_MPU_REGION_SIZE_8MB); #else /* Region 12: overlay first 8MB as cacheable (WB) */ MPU->RBAR = ARM_MPU_RBAR(12, 0x80000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_8MB); #endif #endif /* USE_SDRAM */ #if defined(XIP_EXTERNAL_FLASH) && (XIP_EXTERNAL_FLASH == 1) /* Region 8: XIP external flash, RO, cacheable WB, cover full 64MB */ MPU->RBAR = ARM_MPU_RBAR(8, 0x30000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_RO, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_64MB); #endif /* Peripheral windows */ MPU->RBAR = ARM_MPU_RBAR(13, 0x40000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_16MB); MPU->RBAR = ARM_MPU_RBAR(14, 0x41000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_2MB); MPU->RBAR = ARM_MPU_RBAR(15, 0x41400000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 2, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_1MB); ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_HFNMIENA_Msk); #if defined(__DCACHE_PRESENT) && __DCACHE_PRESENT SCB_EnableDCache(); #endif #if defined(__ICACHE_PRESENT) && __ICACHE_PRESENT SCB_EnableICache(); #endif } #endif /* __CORTEX_M == 7 */ /* -------------------------------------------------------------------------- */ /* CM4 MPU config */ /* -------------------------------------------------------------------------- */ #if __CORTEX_M == 4 void BOARD_ConfigMPU(void) { /* ---- Disable code bus cache (LMEM) ---- */ if (LMEM_PCCCR_ENCACHE_MASK == (LMEM_PCCCR_ENCACHE_MASK & LMEM->PCCCR)) { LMEM->PCCCR |= LMEM_PCCCR_PUSHW0_MASK | LMEM_PCCCR_PUSHW1_MASK | LMEM_PCCCR_GO_MASK; while ((LMEM->PCCCR & LMEM_PCCCR_GO_MASK) != 0U) {} LMEM->PCCCR &= ~(LMEM_PCCCR_PUSHW0_MASK | LMEM_PCCCR_PUSHW1_MASK); LMEM->PCCCR &= ~LMEM_PCCCR_ENCACHE_MASK; } /* ---- Disable system bus cache (LMEM) ---- */ if (LMEM_PSCCR_ENCACHE_MASK == (LMEM_PSCCR_ENCACHE_MASK & LMEM->PSCCR)) { LMEM->PSCCR |= LMEM_PSCCR_PUSHW0_MASK | LMEM_PSCCR_PUSHW1_MASK | LMEM_PSCCR_GO_MASK; while ((LMEM->PSCCR & LMEM_PSCCR_GO_MASK) != 0U) {} LMEM->PSCCR &= ~(LMEM_PSCCR_PUSHW0_MASK | LMEM_PSCCR_PUSHW1_MASK); LMEM->PSCCR &= ~LMEM_PSCCR_ENCACHE_MASK; } ARM_MPU_Disable(); /* Region 0: 0x20240000..0x2025FFFF (128KB) DMA -> Normal non-cacheable */ MPU->RBAR = ARM_MPU_RBAR(0, OCRAM_DMA_NC_BASE); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, /* Normal */ 1, /* shareable recommended */ 0, /* non-cacheable */ 0, 0, ARM_MPU_REGION_SIZE_128KB); /* Region 1: 0x20260000..0x2027FFFF (128KB) local scratch -> Normal WB cacheable */ MPU->RBAR = ARM_MPU_RBAR(1, OCRAM_LOCAL_C_BASE); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, /* Normal */ 0, /* not shareable */ 1, /* cacheable */ 1, /* bufferable (WB) */ 0, ARM_MPU_REGION_SIZE_128KB); /* Region 2: 0x20280000..0x202FFFFF (512KB) shared -> Normal (shareable) */ MPU->RBAR = ARM_MPU_RBAR(2, 0x20280000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, /* Normal */ 1, /* shareable */ 0, /* non-cacheable */ 0, 0, ARM_MPU_REGION_SIZE_512KB); /* Region 3: 0x20300000..0x2033FFFF (256KB) shared -> Normal (shareable) */ MPU->RBAR = ARM_MPU_RBAR(3, 0x20300000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, /* Normal*/ 1, /* shareable */ 0, /* non-cacheable */ 0, 0, ARM_MPU_REGION_SIZE_256KB); ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_HFNMIENA_Msk); /* Invalidate and enable system bus cache (PSCCR) */ LMEM->PSCCR |= LMEM_PSCCR_INVW0_MASK | LMEM_PSCCR_INVW1_MASK | LMEM_PSCCR_GO_MASK; while ((LMEM->PSCCR & LMEM_PSCCR_GO_MASK) != 0U) {} LMEM->PSCCR &= ~(LMEM_PSCCR_INVW0_MASK | LMEM_PSCCR_INVW1_MASK); LMEM->PSCCR |= LMEM_PSCCR_ENCACHE_MASK; /* Invalidate and enable code bus cache (PCCCR) */ LMEM->PCCCR |= LMEM_PCCCR_INVW0_MASK | LMEM_PCCCR_INVW1_MASK | LMEM_PCCCR_GO_MASK; while ((LMEM->PCCCR & LMEM_PCCCR_GO_MASK) != 0U) {} LMEM->PCCCR &= ~(LMEM_PCCCR_INVW0_MASK | LMEM_PCCCR_INVW1_MASK); LMEM->PCCCR |= LMEM_PCCCR_ENCACHE_MASK; } #endif /* __CORTEX_M == 4 */ Evaluation Board Re: RT1170 EVKB M4 memory speed Hi @cyberhelmer ,
Thanks for your interest in NXP MIMXRT series!
NXP has provided test results based on the RT1050. You can view this application note and the corresponding software: https://www.nxp.com/docs/en/application-note/AN12437.pdf
https://www.nxp.com/products/i.MX-RT1050
I did a quick comparison and found that during the SDRAM write test, ANSW used the following settings: Normal memory property, pure 32-bit sequential writes, and DSB after write.
We recommend that you refer to this ANSW to conduct the corresponding experiment.
Best regards, Gavin
記事全体を表示