Nancy,
I tried bencharking the same with MMU table set up to not D-cache QSPI. Value=100000 completes now in 420ms with I-cache enabled and in 440ms with I-cache disabled. It seems your QSPI is not D-cacheable.
Looks like you are using MQX. MMU table is set up in init_bsp.c. You need to find _mmu_vadd_vregion() calls and then add a call for QSPI:
| _mmu_add_vregion((pointer)0x20000000/*QSPI base*/, (pointer)0x20000000/*QSPI base*/, (_mem_size) 0x00100000/*QSPI size, change to appropriate*/, PSP_PAGE_TABLE_SECTION_SIZE(PSP_PAGE_TABLE_SECTION_SIZE_1MB) | PSP_PAGE_TYPE(PSP_PAGE_TYPE_CACHE_WBNWA) | PSP_PAGE_DESCR(PSP_PAGE_DESCR_ACCESS_RW_ALL)); |
If you don't want to recompile BSP, I think you need to 1) disable both I- and D-caches (_ICACHE_DISABLE, _DCACHE_DISABLE), 2) dissable MMU (_mmu_vdisable()), 3) call above _mmu_add_vregion() and reenable MMU and caches. I'm using bare metal, didn't try this with MQX, so please sorry in advance if above won't work.
INVALIDATE calls shouldn't help there! They are used when you need to synchronize CPU memory view with what's really in memory. For example after DMA transfer is complete, or when you receive something from USB, or when you get video frame from VIU3 module etc.. In such cases, provided memory to which transfer is done is cacheable, INVALIDATE calls make cache reloaded from memory and CPU again sees right picture. You say it hangs when executing from DDR without invalidate. You need to figure what makes cache out of sync with memory and hanging. For example it could be M4 core writing lparam to cacheable memory and A5 seeing some old huge buffer size value cached. If you see no similar explanation and only A5 is writing and reading loop variables etc in you code, then perhaps you have HW problems with DDR!
Hm, is DDR slower than QPSI? Didn't you make a mistake, 50ms from QSPI and 175ms from DDR with same loop size setting?