Hi,
I am testing the software prefetch on the NXP1176 board.
My benchmark is to calculate the sum of a big array which is allocated in the heap. The performance is about 5 times slower when I put the heap into the sram instead of the TCM. To improve the performance, I tried to add
__builtin_prefetch()
into the source code. I saw the PLD instruction in the binary, but I didn't see any performance impact of the prefetch instruction.
So I am wondering, is the data prefetch enabled in the M7, or there is no data prefetch on M7 at all?
Thanks
Hello
Hope you are well.
SDRAM controller has no pre-fetch. This is enabled at FlexSPI with NOR/Hyper flash devices. ITCM performance is expected to be considerable higher than SDRAM or any other memory available.
When SDRAM is used for XIP:
-CACHE disabled: Using SDRAM for XIP where code is optimized for size can lead in significant reduce of performance. The core pipeline pre-fetch stage generates a single requests directly to AXI system bus. Each particular single request requires wait for response and generates more latency.
I-CACHE enabled: The core pipeline pre-fetch stage generates a single requests to I-CACHE. If cache hit occur it can be considered as single access. If cache miss occur the cache controller generates burst transfer request on AXI system bus.
If you have more questions do not hesitate to ask me.
Best regards,
Omar