Timing behavior in MPC55xx

ricardofranca · ‎11-07-2013

Hello,

I am doing some experiments with a MPC5554 (running some sort of RTOS) and it seems its timing behavior is a little strange: the execution time of my functions differ from one execution to another, even though the function inputs are the same. Since there are no active interruptions, some friends pointed out that such behavior is due to the e200z6 unified cache and its line replacement policy. Hence, I tried to run very simple applications with the following code patterns:

Pattern 1 (very stable execution time ~ 60100 processor cycles)

C:

register int i = 0;

register int j;

for (j = 0; j < 10000; j++) {

i++;

}

ASM:

li r30, 0

ori r0, r0, 0

.L36:

addi r29, r29, 1

addi r30, r30, 1

cmpwi r30, 10000

blt .L36

============

Pattern 2 (a little less stable execution time, yet quite stable ~ 70100 cycles with a few occurrences of 70200 cycles)

C:

register int i = 0;

register int j;

for (i = 0; i < 10000; i++) {

j = myglobal; // myglobal is a nonvolatile global mapped to an external SRAM

}

ASM:

li r29, 0

ori r0, r0, 0

.L36:

lis r12, %hiadj(myglobal)

lwz r30, %lo(myglobal)(r12)

addi r29, r29, 1

cmpwi r29, 10000

blt .L36

============

Pattern 3 (shaky execution time: varies between ~12k and ~14k cycles)

C:

register int i = 0;

i = i + 1;

i = i + 2;

i = i + 3;

(...)

i = i + 9999;

ASM:

lots of "addi r30, r30, [1-9999]"

============

In all cases, I compiled the code without optimizations. I don't know the pseudo-round-robin line replacement algorithm in details, hence I made the following suppositions:

- Pattern 1 does not use cached data and has all its instructions inside the cache, therefore there are no misses, no line replacement and no timing issues.

- Pattern 3 timing is strongly dependent on the initial cache state.

Do these suppositions make sense? Also, is there any simple (or not so simple :-)) explanation for the small timing "spikes" seen in Pattern 2?

Thanks for your attention!

lukaszadrapa · ‎12-09-2013

Hi,

yes, that makes sense. The cache is most effective when a lot of loops are used. So, in case of pattern 1 and pattern 2, the code is loaded from flash to cache in one or two reads of flash and then the code is fetched in single cycle when executing the loop again and again.

Pattern 3 is linear code and the execution time will be dependent on initial cache state. Generally speaking, large linear code that is executed only very rarely (so the code in cache is overwritten by another code in the meantime) does not take an advantage of cache at all.

Lukas

View solution in original post

lukaszadrapa · ‎12-09-2013

Hi,

yes, that makes sense. The cache is most effective when a lot of loops are used. So, in case of pattern 1 and pattern 2, the code is loaded from flash to cache in one or two reads of flash and then the code is fetched in single cycle when executing the loop again and again.

Pattern 3 is linear code and the execution time will be dependent on initial cache state. Generally speaking, large linear code that is executed only very rarely (so the code in cache is overwritten by another code in the meantime) does not take an advantage of cache at all.

Lukas