> Do You now the reason for these problem? [sic] Cache and RAM are different memories isn't it?
Yes, That's the problem. Your copied code won't be in the right memory for the CPU to execute from. I thought the problem was obvious from my explanation. You should read up some more about what caches do and how they work.
These processors that have caches have separate code and data caches.. Copying code when the caches are enabled:
1 - CPU copy loop reads a word from the source address,
2 - A cache line of data is read into the data cache, CPU reads from there,
3 - CPU copy loop writes a word to the destination address,
4 - Usually a cache line is flushed to make room, and the data is written to the cache,
5 - Copy has finished, but the data is still in the data cache, not in the main memory,
6 - CPU calls the copied function and will either read:
6a - Stale data from the Instruction Cache (if it has called previous code from that address), or
6b - Instruction cache will read stale data from main memory, or
6c - You may have gotten lucky and other memory-using functions may have flushed the data and code caches "by accident".
The only clean way to do this is to:
1 - Perform the copy,
2 - Flush the Data Cache to push the code into main memory,
3 - Clear the Instruction Cache to get rid of any old copies.
Caches make these CPUs run faster. Think 10 to 20 times faster for the Instruction Cache. I was working on code on an MCF5235 which didn't have the data cache enabled, and was managing an Ethernet data transfer speed of 3.3 Mb/s. Enabling the data cache (with bursting) got it up to 6.5Mb/s, almost twice as fast. Rewriting the "memcpy()" and the UDP Checksum (both in assembly) eventually got it up to 9 Mb/s.
Handling the caches also makes "sample code" more complicated, so it is usually left out. Sample code is that, an example which is seldom usable "as-is".
Tom