i.MX53 Cache and Memory Speeds and Latency

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

i.MX53 Cache and Memory Speeds and Latency

1,267 Views
TomE
Specialist II

I've been chasing a problem where some graphics code we use runs slow when a rotated object is at a particular orientation. The rotation code then runs about 3 times slower than normal.

The problem was that at that orientation, the code was stepping through memory in 1724 byte increments, and that's 1/19 of the 32k L1 cache size, and 1/152 of the 256k L2 cache.


So it was a "Cache Buster", forcing reads back to main memory once it wrapped both caches.

So what are the relative latencies of L1, L2 and Memory on the i.MX53?

The Freescale Reference Manual ARM chapter says "go read the ARM manuals" and they say "the L2 delay is programmable to suit the L2 RAM you're using", and the Freescale manual doesn't detail what that is.


There's nothing I can find on the RAM timing or latency, but I'd hope it would be less than 100ns, as Intel chips can manage that:

http://www.xbitlabs.com/articles/memory/display/core2duo-memory-guide_2.html

So time to reverse-engineer... Anybody who wants to run these tests on their own hardware should download the following program and compile it:

The Calibrator (v0.9e), a Cache-Memory and TLB Calibration Tool

Running it on our 1GHz Freescale i.MX53 Evaluation Board gives:

root@lucid-desktop:/tmp# nice --20 ./calibrator 1000 10M report

Calibrator v0.9e

(by Stefan.Manegold@cwi.nl, http://www.cwi.nl/~manegold/)

CPU loop + L1 access:       3.12 ns =   3 cy

             ( delay:       0.37 ns =   0 cy )

caches:

level  size    linesize   miss-latency        replace-time

  1     32 KB   64 bytes    9.95 ns =  10 cy   10.54 ns =  11 cy

  2    256 KB   64 bytes  178.68 ns = 179 cy  179.07 ns = 179 cy

TLBs:

level #entries  pagesize  miss-latency

  1       32       4 KB    46.02 ns =  46 cy

In order to compile you may to rename all instances of the "round" function in the source.

The program even generates gnuplot files of the results which can then be graphed:

report.cache-miss-latency.gif

The above shows the L1 cache running out at 32k and the L2 running out at 256k.


The L1 miss isn't so bad, but the 188 clock L2 miss penalty (10 for L1 then another 178 for L2) is a lot longer than I'd expected.

It is also pretty easy to have code that gets TLB miss penalties too.

Tom

Labels (4)
4 Replies

743 Views
fabio_estevam
NXP Employee
NXP Employee
0 Kudos

743 Views
TomE
Specialist II

> Do you have this patch applied?

I don't know what bootstrap the Freescale development board is running. I only got it out of its box to make sure there wasn't anything wrong with our Karo board, which is running Redboot.

The first step in the graph in my post shows the L2 must be enabled to explain the step between 32k and 256k. Calibrate also reported it properly.

Tom

0 Kudos

743 Views
YixingKong
Senior Contributor IV

Tom

This discussion is closed since no activity. If you still need help, please feel free to reply with an update to this discussion, or create another discussion.

Thanks,

Yixing

0 Kudos

743 Views
YixingKong
Senior Contributor IV

Tom

Had your issue got resolved? If yes, we are going to close the discussion in 3 days. If you still need help please feel free to contact Freescale.

Thanks,
Yixing

0 Kudos