I'd like to share my struggles and eventual success on putting the fragmented default FlexRAM layout of the i.MX RT1010 together on to a single block.
The reason for doing this is that there's large third party libraries which require large chunks of RAM, so we need to utilize all available amount as data RAM.
So I went about reading the FlexRAM application notes, and tried to set the registers based on that description. That didn't work, so I checked the FlexRAM driver, and there's actually the FLEXRAM_AllocateRam() function in fsl_flexram.c, which does exactly what I need. What needs to be determined is, when to call this function. Since the application data already extended beyond the default DTCM size, the RAM needs to be remapped before the RAM variables are initialized at startup (otherwise the initial values would be written to nonexistent addresses). Therefore this call should be added to the strong SystemInitHook() definition, which is called through SystemInit(), from ResetISR(), right before the RAM data is initialized.
So I've added the FLEXRAM_AllocateRam() call to remap all 4 RAM banks to DTCM, and I've modified the linker memory layout to have all of the 128K RAM at SRAM_DTCM. At this point the project builds, but I'm unable to even step into the ResetISR(). It took me a while to realize the only reason that could lead to this: the first thing the CPU does before executing the ResetISR() is that it uses the previous word in memory (the first word of the interrupt vector table) to set the stack pointer to. Well, as the stack is placed at the very end of the RAM, it doesn't exist at reset, since the FlexRAM isn't remapped yet. The best thing I could do is to introduce a separate stack for startup only, and then copy and relocate the stack after the FlexRAM is remapped:
These are my changes in startup_<chipname>.c :
// TODO: make sure this is sized properly (no overflow until __set_MSP() call) __attribute__((used, section(".StartupStack"))) void* startupStack[64]; void* const startupStackEnd = &startupStack[(sizeof(startupStack)/sizeof(startupStack[0]))]; __attribute__ ((used, section(".isr_vector"))) void (* const g_pfnVectors[])(void) = { // Core Level - CM7 startupStackEnd, // The initial stack pointer [...]
[inside ResetISR()] SystemInit(); // relocate stack to official position as it's now mapped { unsigned int msp; unsigned int* currentStack = startupStackEnd, *newStack = &_vStackTop; __asm volatile ("MRS %0, msp" : "=r" (msp) ); while (currentStack > (unsigned int*)msp) { newStack--; currentStack--; *newStack = *currentStack; } __asm volatile ("MSR msp, %0" : : "r" (newStack) : ); } // Copy the data sections from flash to SRAM. [...]
By adding the ".StartupStack" section to .data in the managed linker script, the startup stack ends up at the very beginning of the DTCM (assuming there's no other custom sections preceding it), which exists at startup as DTCM has 1 bank allocated by default.
At this point, the project would run. Sometimes. It definitely didn't run if optimizations are disabled, and with maximum optimizations it would stop working if recompiled for some trivial reason. Otherwise it would end up in HardFault due to memory access exception(s). When I tried to debug through FLEXRAM_AllocateRam(), the strangest thing happened: the same image wouldn't throw fault when stepping through it. With no optimizations, it would always fault, so I stepped though the disassembly, and realized the error in this code:
IOMUXC_GPR->GPR14 &= ~IOMUXC_GPR_GPR14_CM7_CFGDTCMSZ_MASK; IOMUXC_GPR->GPR14 |= IOMUXC_GPR_GPR14_CM7_CFGDTCMSZ(FLEXRAM_MapTcmSizeToRegister(dtcmBankNum));
The register is first cleared, then there's a function call, the result of which is then written to the register. The method of clearing these registers, and then writing them as a separate access, leaves an execution window open, where the RAM is in undefined state. By changing this and similar lines to perform the clear and write in a single register access, the faults no longer occur:
IOMUXC_GPR->GPR14 = (IOMUXC_GPR->GPR14 & (~IOMUXC_GPR_GPR14_CM7_CFGDTCMSZ_MASK)) | IOMUXC_GPR_GPR14_CM7_CFGDTCMSZ(FLEXRAM_MapTcmSizeToRegister(dtcmBankNum));
I've attached the entire fsl_flexram patch.
Edit: One more thing, the MPU configuration has to be adjusted to the new RAM space as well to allow unaligned accesses through the whole RAM region.
This is all it takes to achieve something that really should be made simpler: map all onboard RAM for data use.
I would have greatly appreciated it if such example was already available in the NXP SDK, as this was a long and painful process. We have also contacted NXP FAE but got no follow-up on my initial questions. So my conclusion is: don't expect more than what you pay for with this chip.
But wait, there's more: if you're using FreeRTOS, you will need to adapt it as well. Specifically, when starting the scheduler, the stack discards the rest of the stack, and sets the MSP back to the initial value. In the remapped case this isn't good, so you'll need to change that to the desired top of the stack:
extern void _vStackTop(void); static void prvPortStartFirstTask( void ) { /* Start the first task. This also clears the bit that indicates the FPU is in use in case the FPU was used before the scheduler was started - which would otherwise result in the unnecessary leaving of space in the SVC stack for lazy saving of FPU registers. */ #if !INCLUDE_vTaskEndScheduler /* The start of the stack doesn't match the initial stack! */ unsigned int *stackTop = (unsigned int*)&_vStackTop; __asm volatile("MSR msp, %0" : : "r" (stackTop) : ); /* Set the msp back to the start of the stack. */ __asm volatile( " mov r0, #0 \n" /* Clear the bit that indicates the FPU is in use, see comment above. */ " msr control, r0 \n" #else __asm volatile( #endif " cpsie i \n" /* Globally enable interrupts. */ " cpsie f \n" " dsb \n" " isb \n" " svc 0 \n" /* System call to start first task. */ " nop \n" ); }
Hi Benedek,
Thank you so much for sharing this! I will pass your feedback to the SDK team so they can consider adding something like this in future releases of the SDK.
Regards,
Victor
Hi Benedek
See https://community.nxp.com/thread/536967#comment-1338266
The uTasker project covers such needs out-of.the box and is compatible with 1010..1064.
It retains stack content across memory partition swaps (as illustrated in the document attached to the other thread).
Regards
Mark
[uTasker project developer for Kinetis and i.MX RT]