P2020 code align with respect to L1 cache lines

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

P2020 code align with respect to L1 cache lines

737 Views
thierrybernier
Contributor I

The following loop :

  1. li   r9,0
  2. addi r9,r9,1
  3. stw r9,0(r3)
  4. b "2."

executes 799 loops per unit of time (arbitrary) when fully inside a L1 cache line (32 bytes address boundaries), and executes only 531 loops per the same unit time when a L1 cache line boudary lies inside the loop code.

This is my hypothesis that this performance issue comes from the cache line boundary problem. I could not confirm it using Performance Monitor Counters.

Could you confirm that the change of the loop speed is related to cache line alignment ?

I have read AN2665 (Fecth Fact 2).

I suspect that it is very visible because the loop is very short. What about longer loops/code ?

Should I use function/loop/label alignement options of the compiler ? (this would inserts many nop instructions).

 

0 Kudos
Reply
0 Replies