dimepu wrote: I guess the disable cache was causing the code to be stored in one of the external memories, and the CPU spend more time reading the code from there.
The point about I/O pin speed is very important. Last year I was working on a Marvell PXA320. To get to an I/O pin takes TWO HUNDRED WAIT STATES on that chip!
So you program an internal timer, and then monitor it with code and wave an I/O pin around at 1/65536 of the timer rate. Then measure that to check your clock and CPU speed programming. Once that's set up it should remain OK.
The cache setting can't control what memory the code is loaded into. That's up to where you link it.
My experience is with the MCF5329, but there should be similarities. I'm listing what it does in case that helps you to find equivalent sections in your reference manual.
1 - The external FLASH we have on the 5329 has to be set up with the right number of wait states - it defaults to a very slow rate otherwise - 63 wait states at 80MHz, or 189 times slower than the CPU.
2 - We then configure the external FLASH to run at NINE wait-states at 80MHz, so it is running at 1/27 of the CPU clock rate. Code running from there is glacial.
3 - For the SRAM we have to set RAMBAR, to Enable, set the Address and allow CPU Direct Access to the SRAM. Without the latter the CPU gets to the SRAM by the "Crossbar Back Door" very slowly. And you have to program the Crossbar if you have one too. There's even a register somewhere to allow "burst access" to the memory which defaults to off.
4 - Set up the cache properly. We're using "Write-through" and find that faster with our application mix than "Write Back",
Then you'll have fun trying to get any DMA working as that either has to have all of its buffers in uncached SRAM or you need to set up "uncached buffer regions" or use cache flushes in the right places.
Have fun.