APB peripheral access cycle of LPC1114/301

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

APB peripheral access cycle of LPC1114/301

930 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by cakehuang on Thu May 09 21:04:33 MST 2013
Hi team,

Can you please tell me how many AHB bus cycles it takes Coretex m0 to access peripheras like timers which is connected to APB bus?
For example, I am using ADC example which sets coretex-m0 and AHB to 48MHz. After timer initialization, then I read or write to timer register of TMR32B0TC. 

Thanks!
0 Kudos
Reply
10 Replies

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by R2D2 on Thu May 16 03:57:56 MST 2013

Quote: cakehuang
How many wait states will be asserted?



3
0 Kudos
Reply

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by cakehuang on Wed May 15 19:47:27 MST 2013

Quote: micrio
Here is a long thread discussing the operation of the pre-fetch queue.
http://knowledgebase.nxp.com/showthread.php?t=460
I used the word cache but that is not strictly correct, pre-fetch unit is a
better word.

In that thread, I and others examine the performance issues around
code alignment, code size and loop speed. It is possible to get zero wait
states if you can run out of the 3 words of data that get pre-fetched from
flash. If your loop does not fit then you take a big hit in performance.

Pete.


Thanks!
Your reply remind me that I should be ware that my assembly code might affect the pipeline. I can't wait to study this thread.
0 Kudos
Reply

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by cakehuang on Wed May 15 19:42:30 MST 2013

Quote: R2D2
How fast your peripheral access is depends on your code and especially on optimization level.
The fastest possible access is load / store with preloaded registers:

.L1:
str  r4, [r0, #0]    //writing 0x01 (in r4) to peripheral (r0 = GPIO2DATA) = PIO2_0 high
str  r3, [r0, #0]    //writing 0x00 (in r3) to peripheral (r0 = GPIO2DATA) = PIO2_0 high
b    .L1             //branch
Load / store costs you 2 cycles, branch 3 cycles. So this is a 146ns loop (7 cycles at 48MHz).

Often compiler isn't using the fastest option, especially if you optimize -Os (size optimized). Then additional cycles are required to load registers (like peripheral address in r0 in this sample).

If this timing is critical, you have to write this part in Assembler

Note: UM 10398 'Chapter 28.7 Cortex-M0 instruction summary' is showing all instructions and their cycles


R2D2, thanks for your explain.
In my assembly code, I almost ran out of lower registers(r0~r7) to preload peripheral's address. Cortex M0 has limitation in load/store. Only r0~r7 can be used as source/destination address.
And for the instruction cycle listed by UM, the condition is zero wait state.For example, 2 cycles of load/store is making assumption that source or destination address is zero wait state just like GPIO AHB slave. But what if the source/destination address is within APB space? How many wait states will be asserted?
0 Kudos
Reply

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by R2D2 on Wed May 15 08:22:11 MST 2013

Quote: cakehuang
Can you please tell me how many AHB bus cycles it takes Coretex m0 to access peripheras like timers which is connected to APB bus?



How fast your peripheral access is depends on your code and especially on optimization level.
The fastest possible access is load / store with preloaded registers:

.L1:
str  r4, [r0, #0]    //writing 0x01 (in r4) to peripheral (r0 = GPIO2DATA) = PIO2_0 high
str  r3, [r0, #0]    //writing 0x00 (in r3) to peripheral (r0 = GPIO2DATA) = PIO2_0 high
b    .L1             //branch
Load / store costs you 2 cycles, branch 3 cycles. So this is a 146ns loop (7 cycles at 48MHz).

Often compiler isn't using the fastest option, especially if you optimize -Os (size optimized). Then additional cycles are required to load registers (like peripheral address in r0 in this sample).

If this timing is critical, you have to write this part in Assembler

Note: UM 10398 'Chapter 28.7 Cortex-M0 instruction summary' is showing all instructions and their cycles
0 Kudos
Reply

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by micrio on Wed May 15 06:38:46 MST 2013
Here is a long thread discussing the operation of the pre-fetch queue.
http://knowledgebase.nxp.com/showthread.php?t=460
I used the word cache but that is not strictly correct, pre-fetch unit is a
better word.

In that thread, I and others examine the performance issues around
code alignment, code size and loop speed.   It is possible to get zero wait
states if you can run out of the 3 words of data that get pre-fetched from
flash.   If your loop does not fit then you take a big hit in performance.

Pete.
0 Kudos
Reply

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by cakehuang on Wed May 15 01:55:15 MST 2013
Thank you for your reply, micrio!

I see what you mean that pipeline and cache play an important role like brach instruction might result in pipeline flush or drain.
But you mentioned cache, does LPC1114 has cache implemented?
In my opinon, it does not have cache. Cortex m0 should not have cache implemented inside it by ARM(pipeline is of course there). NXP does not implement cache outside the core either. (More than 8 years ago, I ever worked on one of Philips's audio chip.It was built upon arm7tdmis which does not have cache. But Philips guys added cahche IP like 2 way, 4 associative.....Some time later this chip was transferred to Philips's MCU team. Really long time ago! haha! )

I would assme that ABP peripheral might impose more wait states seen by the AHB bus. That's why I guess accessing APB will take more time.
0 Kudos
Reply

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by micrio on Mon May 13 20:31:13 MST 2013
I have written code that will cycle an I/O pin at 24 MHz on a 48 MHz
system.   I have measured this on a scope.

However, there can be a problem.   The code normally executes out of flash.
There is a three line cache, if the loop runs entirely out of cache then
the CPU will run at full speed with no wait states.   If the loop is word
aligned and short then it can run completely out of the cache.  
Alternatively, you can move your critical code to RAM which runs
with no wait states.   I believe that only flash imposes wait states.

It is hard to write efficient assembly code.   When I must create assembly
I always write it in C and let the compiler process it.   Then I modify
the assembly to do what I want.   I have tried writing assembly from
scratch and it is always worse than what the compiler does.

If you what to test the peripheral in question write a loop in C and use
the alignment directives.   Look at the assembly to insure that it will
fit in the three cache lines.   Point your loop at a RAM location to
prove that you are getting the expected performance.   If that works
then point it at the peripheral and do the same test.   I believe that you
will see no wait states.

We had a long discussion on this subject a while back.

Pete.
0 Kudos
Reply

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by cakehuang on Mon May 13 20:02:54 MST 2013

Quote: micrio
I believe that all I/O takes 1 cycle. Only the flash is slower at 3 cycles
but the cache helps with that. This should be in the documentation.

Pete.


Both I/O and flash are AHB slave. They are connected to AHB directly.I mean the APB peripherals like register of TMR32B0TC. APB are bridge between AHB and slow peripherals. In the user manual I can not find how many cycles it takes cortex-M0 to read one peripheral register like TMR32B0TC.  
The reason why I ask this is I am trying to capture a relative high speed signal. I first enable timer and then polling both edges of this signal. When thereis  an edge change, I read out timer counter value and store it onto RAM. I have completed this task by writing this code in assembly code.  But I found my code can not work well if the sgnal speed gets higher while it is ok for relative slow speed. Hence I would like to know the APB performance.
0 Kudos
Reply

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by micrio on Mon May 13 17:04:01 MST 2013
I believe that all I/O takes 1 cycle.   Only the flash is slower at 3 cycles
but the cache helps with that.   This should be in the documentation.

Pete.
0 Kudos
Reply

841 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by cakehuang on Sun May 12 19:54:54 MST 2013
Hi team?
Can anyone provide me this APB bus information?
Thanks!
0 Kudos
Reply