Content originally posted in LPCWare by wmues on Mon Jul 29 02:28:20 MST 2013
I don't think that this is a problem of the CPU core: it's a problem of the flash interface.
Buffering and caching are common technics these days, and your code should not count on the absence of these technics. As the read performance of flash chips is slow, the internal flash has a large word width (64 bits ?), and a prefetch engine.
You should isolate the time-critical code into a subroutine, and use the linker script file to place these subroutine at a fixed address. So the execution timing will not depend on the location of the code, but only on the content of the code.