When measuring the memory read performance of the T4240 using the linux kernel available on the RDB, it takes approximately 128 cycles (as read from the ATB) to read 1024 bytes of memory using 128 "lfd %0,offset(rX)" instructions.
However, when performing the same test with a piece of bareboard code that is started using the "release" and "go" commands from u-boot it takes much longer when the code is executed on core 1-11, namely around 750 cycles. Core 0 is able to read 1kB in ~128 cycles.
All L1 and L2 cache registers are the same for all cores and clusters. The L2 cache is set allow all ways for all cores. Also each core is started using the exact same startup code.
During the measurement all other cores are perfoming a "while (1) ;" loop and should not influence any other cores.
All cores have both threads enabled. Thread 0 and 1 on core 0 show expected behaviour, while both threads on all other cores show the much longer time needed to read the 1kB of data.