In i.MX 6 board, we want to verify the speed at which the code in DDR3-1066 is executed. To verify this we executed one Mega instructions from DDR3-1066.
Our clock settings are as follows:
ARM_PLL is set to 996 MHz, by configuring the divider to (996 * 2)/24.
PLL2 is 528MHz, and DDR3-1066 is clocked by PPL2.
Code which is executed in DDR3-1066 is as follows:
volatile unsigned long int test_mips_t_delay;
test_mips_t_delay = 100000UL;
while(test_mips_t_delay--) // Considered around 5 instructions
asm volatile ("nop\n\t"); // 6th instruction, assumed one cycle per instruction
asm volatile ("nop\n\t"); // 7th Instruction, assumed one cycle per instruction
asm volatile ("nop\n\t"); // 8th Instruction, assumed one cycle per instruction
asm volatile ("nop\n\t"); // 9th Instruction, assumed one cycle per instruction
asm volatile ("nop\n\t"); // 10th Instruction, assumed one cycle per instruction
The time taken to execute one Mega instructions in DDR3-1066 is around 22.998 milli seconds, when mmu is enabled. Without enabling the mum, it takes around 23.955 milli seconds to execute one Mega instructions in DDR3-1066.
But we expect around 1.004 milli secs as per the clock settings.
If the core executes at 996 MHz, it can execute 996 Mega (one cycle) instructions in a second. Since its a DDR3-1066, the data rate should be double i.e 528 * 2 MHz. So, the time expected to execute one Mega instructions is ((1/996)*10 power 3) milli seconds i.e 1 milli second.
Kindly help us to understand, why the code in DDR3-1066 executes at a very lower speed than expected.
Thank you all for your support. Time being we are able to achieve better execution time, by fixing a issue in the mmu enable code. At present it looks like the code executes at a better rate. We are still analysing the execution time. We will get back to you, once after completing this exercise. Thanks again for your valuable support.
Have you the caches enabled? Enabling/disabling the caches may affect the data transfer rate significantly. Also, what code do you use to copy data? Do you execute the code under Linux (or other OS) or on bare metal platform? What exactly i.MX6 family processor do you use? What is DDR3 data bus width?
executing million instructions in a loop will not provide any usefull data. The instructions are stored in processor's caches.
I suggest you to abandon loop-like instruction tests.
You can advance your data test:
Let us know how is your progress.
Thanks alot for the useful information. We tried copying 512 MB and it takes around 12.5 seconds. This is too slow. Could you please help us to understand, what would have gone wrong.