P2020 code align with respect to L1 cache lines

737 Views

The following loop :

li r9,0
addi r9,r9,1
stw r9,0(r3)
b "2."

executes 799 loops per unit of time (arbitrary) when fully inside a L1 cache line (32 bytes address boundaries), and executes only 531 loops per the same unit time when a L1 cache line boudary lies inside the loop code.

This is my hypothesis that this performance issue comes from the cache line boundary problem. I could not confirm it using Performance Monitor Counters.

Could you confirm that the change of the loop speed is related to cache line alignment ?

I have read AN2665 (Fecth Fact 2).

I suspect that it is very visible because the loop is very short. What about longer loops/code ?

Should I use function/loop/label alignement options of the compiler ? (this would inserts many nop instructions).

0 Replies