DDR3-1066 performance in i.MX6

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

DDR3-1066 performance in i.MX6

1,364 Views
gopus
Contributor II

In i.MX 6 board, we want to verify the speed at which the code in DDR3-1066 is executed. To verify this we executed one Mega instructions from DDR3-1066.

Our clock settings are as follows:
ARM_PLL is set to 996 MHz, by configuring the divider to (996 * 2)/24.
PLL2 is 528MHz, and DDR3-1066 is clocked by PPL2.

Code which is executed in DDR3-1066 is as follows:

volatile unsigned long int test_mips_t_delay;
test_mips_t_delay = 100000UL;

Start_Timer();
while(test_mips_t_delay--) // Considered around 5 instructions
{
asm volatile ("nop\n\t"); // 6th instruction, assumed one cycle per instruction
asm volatile ("nop\n\t"); // 7th Instruction, assumed one cycle per instruction
asm volatile ("nop\n\t"); // 8th Instruction, assumed one cycle per instruction
asm volatile ("nop\n\t"); // 9th Instruction, assumed one cycle per instruction
asm volatile ("nop\n\t"); // 10th Instruction, assumed one cycle per instruction
}
Stop_Timer();


The time taken to execute one Mega instructions in DDR3-1066 is around 22.998 milli seconds, when mmu is enabled. Without enabling the mum, it takes around 23.955 milli seconds to execute one Mega instructions in DDR3-1066.
But we expect around 1.004 milli secs as per the clock settings.
(i.e.,)
If the core executes at 996 MHz, it can execute 996 Mega (one cycle) instructions in a second. Since its a DDR3-1066, the data rate should be double i.e 528 * 2 MHz. So, the time expected to execute one Mega instructions is ((1/996)*10 power 3) milli seconds i.e 1 milli second.

Kindly help us to understand, why the code in DDR3-1066 executes at a very lower speed than expected.

6 Replies

775 Views
gopus
Contributor II

Thank you all for your support. Time being we are able to achieve better execution time, by fixing a issue in the mmu enable code. At present it looks like the code executes at a better rate.  We are still analysing the execution time. We will get back to you, once after completing this exercise. Thanks again for your valuable support.

0 Kudos

775 Views
art
NXP Employee
NXP Employee

Also, please refer to the following community thread:

https://community.nxp.com/message/430721 

It contains many useful hints about the matter.

Artur

0 Kudos

775 Views
art
NXP Employee
NXP Employee

Have you the caches enabled? Enabling/disabling the caches may affect the data transfer rate significantly. Also, what code do you use to copy data? Do you execute the code under Linux (or other OS) or on bare metal platform? What exactly i.MX6 family processor do you use? What is DDR3 data bus width?

Best Reagrds,

Artur

0 Kudos

775 Views
michalrisa
NXP Employee
NXP Employee

Hello,

executing million instructions in a loop will not provide any usefull data. The instructions are stored in processor's caches.

I suggest you to abandon loop-like instruction tests.

You can advance your data test:

  1. Avoid test task migrations among processor cores (set process/thread affinity).
  2. Allocate big buffer (512 MB).
  3. Fill the buffer with data (to trigger first-touch allocation).
  4. Start time measurement.
  5. Do whatever test you want.
  6. Stop time measurement.
  7. Free buffer.
  8. Report results.

Let us know how is your progress.
Best regards,

Michal Riša
Software Engineer
NXP Semiconductors

775 Views
gopus
Contributor II

Thanks alot for the useful information.  We tried copying 512 MB and it takes around 12.5 seconds.  This is too slow. Could you please help us to understand,  what would have gone wrong. 

Regards, 

Gopu

0 Kudos

775 Views
gopus
Contributor II

Addition to this, we also tried to copy 16MB of data form DDR3 to DDR3, and it took around 400 milli seconds.

0 Kudos