Hello,
I am having trouble understanding how the FlexSPI AHB RX buffer works, and having an unexpectedly large loss of performance when executing code from the external QSPI flash. The board I am using is a MIMXRT1060-EVKB.
The code I am testing is a very simple for loop toggling a gpio and the led periodically : for (uint32_t time=0; time < us; time++) { }. It takes 20 bytes in memory. The flash behaves as if there was a barrier every 32 bytes, inducing performance hits when crossing it, and as if the AHB RX buffer was only capable to hold one 32B page. So if the code is stored between 0x60002F60 and 0x60002F74, it would not cross a 32B barrier, and the loop is executed at an identical frequency if the core cache is enabled or disabled, and whether prefetching is enabled or not in the FlexSPI. If the code is stored between 0x60002F70 and 0x60002F84, it would cross over a boundary, and it executes 6x slower with the core cache disabled, and 100x slower with prefetch disabled. When the prefetch is enabled, checking the activity on the external flash with an oscilloscope confirms that the RX buffer holds all the code when its not crossing a boundary, and that it doesn't in the other case.
In the FlexSPI registers, AHBCR[READSZALIGN] is set to 0, and the flash is accessed in individual mode. AHBRXBUF0-2 are not configured, so the AHBRXBUF3 should be of size equal to the AHB RX Buffer total size, which is 128*64bit according to AN12437. It should also be used for all read accesses, and this seems to be the case since disabling PREFETCHEN in AHBRXBUF3CR0 lead to the same results as disabling PREFETCHEN in AHBCR. The MPU is not activated. The AHBRXBUF0-2 Control Registers also mention AHBBUFREGIONiSTART and AHBBUFREGIONiEND, and I cannot find any information on these fields in the reference manual.
Activating core caches or putting the code in the ITCM as suggested in AN12437 fixes the issue, but I wanted to understand why this behaviour was happening.
Thanks in advance.