Hi All,
Yes, as Bill mentioned, after we increased the CPU clock cycle (and hence the platform clock), we observed memory issues which showed even at normal temperatures. I am not sure anymore how I obtained the latency settings, but I remember that it was kind of a mix and match from previous values and proved to be working at that time.
Just dig a bit deeper into it, here is a table of values we have seen so far. All values are tag/latency and in the register format (add 1 to get to cycles, order is write/read/setup, exactly as seen in 0x40006108/0x4000610c, see the ARM L2C-310 docs).
| Reset value | Linux 3.0 BSP | Vanilla Linux |
| 0x111/0x222 | 0x132/0x132 | 0x111/0x000 |
- The reset value had been obtained by reading the register state before using the L2 (using U-Boot without L2 cache support). Not sure if these are really reset values or if the boot ROM writes these values.
- The Linux 3.0 BSP value seems to be introduced in July 2012. The initial version of the code initialization function mxc_init_l2x0 looks an awful lot like the version from arch/arm/mach-mx6/mm.c, including the data/tag latency values... It seems that this values has just been copied from i.MX6...
- The vanilla Linux value has been introduced in May 2013 with the first commit of the vf610.dtsi device tree file, merged into Linux 3.11 then. The value seem to be a try to use the reset values, just cache and data latencies has been mixed up and the offset of 1 clock (between the register vs. device tree values) has not been accounted for...
So far, I would say the reset value is probably the best choice. According to these values, it would take 2 cycles to access the tags and 3 cycles to access data...
The application note AN4947 ("Understanding Vybrid Architecture") provides actually some more insight in the L2 cache architecture. Chapter 3.10 says:
During the first phase of L2 access, the tags are accessed in parallel and hit/miss is determined. The tag arrays are built in the L2 cache controller. No wait states are necessary
Looking at the values available for the L2C-310, I would translate that into "0b000 - 1 cycle of latency, there is no additional latency", for all three tag latency values (write/read/setup).
Further more the manual says:
During the second phase of a L2 access, a single data array structure is accessed:
a) L2 read cache hit: the L2 data array provides one line of data to the L2 cache controller at the same time. For Vybrid, it takes three clock cycles to read or write the L2 data array structure (upper GRAM on platform frequency). Together, four clocks Cortex-A5 latency
Again, looking at the options in the L2C-310, I would translate that into "0b010 - 3 cycle of latency", as write/read data latencies. Not sure about setup data latency...
So this suggests a value of 0x000/0x22X.... This more or less aligns with the reset value, just the tag latencies are smaller.
However, this sentence in the L2C-310 manual (under RAM clocking and latencies) makes my interpretation questionable:
The programmed RAM latency values refer to the actual clock driving the RAMs, that is, the slow clock if the RAMs are clocked slower than the cache controller.
The latencies in the "Understanding Vybrid Architecture" are in Controller clock cycles... Given that the programmed latencies refer to RAM latencies, the values from the "Understanding Vybrid Architecture" are probably not applicable here.... So I don't know what the right values are. Clearly, 1 clock cycle for the data latency as it is in the mainline Kernel seem to be too low,
Maybe the authors of the above mentioned paper can comment on that, jiri-b36968 or Rastislav Pavlanin?
--
Stefan