Timing behavior in MPC55xx

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Timing behavior in MPC55xx

Jump to solution
495 Views
ricardofranca
Contributor II

Hello,

I am doing some experiments with a MPC5554 (running some sort of RTOS) and it seems its timing behavior is a little strange: the execution time of my functions differ from one execution to another, even though the function inputs are the same. Since there are no active interruptions, some friends pointed out that such behavior is due to the e200z6 unified cache and its line replacement policy. Hence, I tried to run very simple applications with the following code patterns:

Pattern 1 (very stable execution time ~ 60100 processor cycles)

C:

register int i = 0;

register int j;

 

  for (j = 0; j < 10000; j++) {

    i++;

  }

ASM:

li    r30, 0

ori    r0, r0, 0

.L36:

addi    r29, r29, 1

addi    r30, r30, 1

cmpwi    r30, 10000

blt    .L36

============

Pattern 2 (a little less stable execution time, yet quite stable ~ 70100 cycles with a few occurrences of 70200 cycles)

C:

  register int i = 0;

  register int j;

  for (i = 0; i < 10000; i++) {

    j = myglobal; // myglobal is a nonvolatile global mapped to an external SRAM

  }

ASM:

li    r29, 0

ori    r0, r0, 0

.L36:

lis    r12, %hiadj(myglobal)

lwz    r30, %lo(myglobal)(r12)

addi    r29, r29, 1

cmpwi    r29, 10000

blt    .L36

============

Pattern 3 (shaky execution time: varies between ~12k and ~14k cycles)

C:

register int i = 0;

i = i + 1;

i = i + 2;

i = i + 3;

(...)

i = i + 9999;

ASM:

lots of "addi r30, r30, [1-9999]"

============

In all cases, I compiled the code without optimizations. I don't know the pseudo-round-robin line replacement algorithm in details, hence I made the following suppositions:

- Pattern 1 does not use cached data and has all its instructions inside the cache, therefore there are no misses, no line replacement and no timing issues.

- Pattern 3 timing is strongly dependent on the initial cache state.

Do these suppositions make sense? Also, is there any simple (or not so simple :-)) explanation for the small timing "spikes" seen in Pattern 2?

Thanks for your attention!

0 Kudos
1 Solution
402 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

yes, that makes sense. The cache is most effective when a lot of loops are used. So, in case of pattern 1 and pattern 2, the code is loaded from flash to cache in one or two reads of flash and then the code is fetched in single cycle when executing the loop again and again.

Pattern 3 is linear code and the execution time will be dependent on initial cache state. Generally speaking, large linear code that is executed only very rarely (so the code in cache is overwritten by another code in the meantime) does not take an advantage of cache at all.

Lukas

View solution in original post

0 Kudos
1 Reply
403 Views
lukaszadrapa
NXP TechSupport
NXP TechSupport

Hi,

yes, that makes sense. The cache is most effective when a lot of loops are used. So, in case of pattern 1 and pattern 2, the code is loaded from flash to cache in one or two reads of flash and then the code is fetched in single cycle when executing the loop again and again.

Pattern 3 is linear code and the execution time will be dependent on initial cache state. Generally speaking, large linear code that is executed only very rarely (so the code in cache is overwritten by another code in the meantime) does not take an advantage of cache at all.

Lukas

0 Kudos