The latency of run code in OCRAM is big

cancel
Showing results for 
Search instead for 
Did you mean: 

The latency of run code in OCRAM is big

368 Views
NXP Employee
NXP Employee

Hi everybody:

 

I write test code of 10000 accumulation code like below section, but the implement latency has big differency with theoretical value, the test condition is listed:


CA5 run in 400MHz, and platform clock 133MHz;
and run this code section in bareboard and MQX project, test result is listed:

 

code run conditionstime(us)Header 3Header 4Header 5
run code in bareboard in SRAM6000
run code in MQX in SRAM4500
run code in MQX in DDR400

you could see, the latency run code in RAM is huge than in DDR, but the difference between bareboard & MQX is not big, the MQX perform better for it enable L1 & L2 cache.

 

since the OCRAM clock is drive from system clock, I check this clock from CKO1, the clock is fine,

and I adjust this clock to other value by BUS_CLK_DIV, the result changed accordingly.

So I want to ask why the performance is bad, I checked AN4947, from the AN, the latency in SRAM is normal. Besides platform clock, any other factor will affect it?

 

I attached the test code also, including bardboard and MQX, I don't adjust other project code. 

 

 

/* Make sure the clock to the LPTMR is enabled */
CCM->CCGR2 |= CCM_CCGR2_CG0(1);

/* Reset LPTMR settings */
LPTMR0->CSR=0;

/* Set the compare value to the number of ms to delay */
LPTMR0->CMR = 0xffff;//set max value

/* Set up LPTMR to use 32kHz LPO with no prescaler as its clock source */
LPTMR0->PSR |= LPTMR_PSR_PCS(0x3) | LPTMR_PSR_PBYP_MASK;//use 32K as clock source

/* Start the timer */
LPTMR0->CSR |= LPTMR_CSR_TEN_MASK;

while(1)
{

printf("GPIO Example!%d,%d\n",diff1,diff2);
diff1=j2-j1;
diff2=j3-j2;
//Turn on LED's by driving 0 (active low)
//Toggle PTB1
GPIO0->PTOR|=PIN(23);

LPTMR0->CNR = 0;//must first write to the CNR with any value
j1 = LPTMR0->CNR*1000/128;
// tick1 = getticks();
//Delay
for(i=0;i<10000;i++);

//time_delay_ms(1000);
// tick2 = getticks();
// difftick=tick2-tick1;

LPTMR0->CNR = 0;//must first write to the CNR with any value
j2 = LPTMR0->CNR*1000/128;

//Turn off LED's by driving 1
//Toggle PTB1
GPIO0->PTOR|=PIN(23);

//Delay
for(i=0;i<10000;i++);
//time_delay_ms(1000);

LPTMR0->CNR = 0;//must first write to the CNR with any value
j3 = LPTMR0->CNR*1000/128;
}

Original Attachment has been moved to: gpio.c.zip

Original Attachment has been moved to: hello.c.zip

Labels (3)
0 Kudos
4 Replies

6 Views
Senior Contributor IV
  •   the MQX perform better for it enable L1 & L2 cache

There are code cache and data cache. While code cache on A5 can be simply enabled for whole address space, data cache is configurable with MMU tables. If you don't set up MMU (and its tables) properly you gen non-cacheable data.

Some addresses can be cacheable, others can be made not cacheable. Speed difference is huge, like your tables confirm. I think MQX setups SRAM to be non-cacheable at all, check out sources.

6 Views
NXP Employee
NXP Employee

Hi Edward:

Many thanks for you.

I found the root cause in MQX, it set OCRAM as non-cacheable, after I change the setting, the performance is fine now.

for bareboard project, I think it doesn't configure MMU at all.

below is code from MQX, init_bsp.c

//dawei//_mmu_add_vregion((void *)__INTERNAL_SRAM_BASE, (void *)__INTERNAL_SRAM_BASE, (_mem_size) 0x00100000, PSP_PAGE_TABLE_SECTION_SIZE(PSP_PAGE_TABLE_SECTION_SIZE_1MB) | PSP_PAGE_TYPE(PSP_PAGE_TYPE_CACHE_NON) | PSP_PAGE_DESCR(PSP_PAGE_DESCR_ACCESS_RW_ALL));
_mmu_add_vregion((void *)__INTERNAL_SRAM_BASE, (void *)__INTERNAL_SRAM_BASE, (_mem_size) 0x00100000, PSP_PAGE_TABLE_SECTION_SIZE(PSP_PAGE_TABLE_SECTION_SIZE_1MB) | PSP_PAGE_TYPE(PSP_PAGE_TYPE_CACHE_WBNWA) | PSP_PAGE_DESCR(PSP_PAGE_DESCR_ACCESS_RW_ALL));

BTW, I couldn't find more info for MMU and L1 cache, 

Do you have more detailed document for MMU? if you have, could you send me?

0 Kudos

6 Views
Senior Contributor IV

About MMU and L1 you may find at arm.com.

https://developer.arm.com/products/architecture/a-profile/docs/ddi0406/latest/arm-architecture-refer...

ARMv7-a docs are not very easy to study. What you are looking for is in chapter B3 , virtual memory system architecture. Check out translation tables ("MMU tables"), short/long-descriptor translation table format (these are "translation table entries"), etc etc

Regards

Edward

0 Kudos

6 Views
NXP Employee
NXP Employee

thx!

0 Kudos