i.MX53 Cache and Memory Speeds and Latency

TomE · ‎11-21-2013

I've been chasing a problem where some graphics code we use runs slow when a rotated object is at a particular orientation. The rotation code then runs about 3 times slower than normal.

The problem was that at that orientation, the code was stepping through memory in 1724 byte increments, and that's 1/19 of the 32k L1 cache size, and 1/152 of the 256k L2 cache.

So it was a "Cache Buster", forcing reads back to main memory once it wrapped both caches.

So what are the relative latencies of L1, L2 and Memory on the i.MX53?

The Freescale Reference Manual ARM chapter says "go read the ARM manuals" and they say "the L2 delay is programmable to suit the L2 RAM you're using", and the Freescale manual doesn't detail what that is.

There's nothing I can find on the RAM timing or latency, but I'd hope it would be less than 100ns, as Intel chips can manage that:

http://www.xbitlabs.com/articles/memory/display/core2duo-memory-guide_2.html

So time to reverse-engineer... Anybody who wants to run these tests on their own hardware should download the following program and compile it:

The Calibrator (v0.9e), a Cache-Memory and TLB Calibration Tool

Running it on our 1GHz Freescale i.MX53 Evaluation Board gives:

root@lucid-desktop:/tmp# nice --20 ./calibrator 1000 10M report

Calibrator v0.9e

(by Stefan.Manegold@cwi.nl, http://www.cwi.nl/~manegold/)

CPU loop + L1 access: 3.12 ns = 3 cy

( delay: 0.37 ns = 0 cy )

caches:

level size linesize miss-latency replace-time

1 32 KB 64 bytes 9.95 ns = 10 cy 10.54 ns = 11 cy

2 256 KB 64 bytes 178.68 ns = 179 cy 179.07 ns = 179 cy

TLBs:

level #entries pagesize miss-latency

1 32 4 KB 46.02 ns = 46 cy

In order to compile you may to rename all instances of the "round" function in the source.

The program even generates gnuplot files of the results which can then be graphed:

The above shows the L1 cache running out at 32k and the L2 running out at 256k.

The L1 miss isn't so bad, but the 188 clock L2 miss penalty (10 for L1 then another 178 for L2) is a lot longer than I'd expected.

It is also pretty easy to have code that gets TLB miss penalties too.

Tom

fabio_estevam · ‎11-21-2013

Tom,

Do you have this patch applied?

http://git.denx.de/?p=u-boot.git;a=commitdiff;h=4867b634b7c0e5ede258b4998fa4b2710e7daacf

Regards,

Fabio Estevam

TomE · ‎11-21-2013

> Do you have this patch applied?

I don't know what bootstrap the Freescale development board is running. I only got it out of its box to make sure there wasn't anything wrong with our Karo board, which is running Redboot.

The first step in the graph in my post shows the L2 must be enabled to explain the step between 32k and 256k. Calibrate also reported it properly.

Tom

YixingKong · ‎02-26-2014

Tom

This discussion is closed since no activity. If you still need help, please feel free to reply with an update to this discussion, or create another discussion.

Thanks,

Yixing

YixingKong · ‎02-18-2014

Tom

Had your issue got resolved? If yes, we are going to close the discussion in 3 days. If you still need help please feel free to contact Freescale.

Thanks,
Yixing

i.MX53 Cache and Memory Speeds and Latency

i.MX53 Cache and Memory Speeds and Latency

Graphics & Display

i.MX51

i.MX53

Linux