Hi,
I was working on a small project based on a lpc1114fn28 processor. Although my program runs successfully, I kept noticing some wired timing behaviors. For instance, if I wrote two identical functions in ASM with different names, sometimes their executing times are not exactly the same. Specifically, one function (in disassembly codes) is
16a: f3bf 8f6f isb sy
16e: 681c ldr r4, [r3, #0] ; (Read SysTick)
170: e004 b.n 0x17c
172: 46c0 nop ; (mov r8, r8)
174: 46c0 nop ; (mov r8, r8)
176: 46c0 nop ; (mov r8, r8)
178: 46c0 nop ; (mov r8, r8)
17a: 46c0 nop ; (mov r8, r8)
17c: 681d ldr r5, [r3, #0] ] ; (Read SysTick)
According to the SysTick, it takes 7 cycles. However, for the same codes in a different address
1c8: f3bf 8f6f isb sy
1cc: 681c ldr r4, [r3, #0] ; (Read SysTick)
1ce: e004 b.n 0x1da
1d0: 46c0 nop ; (mov r8, r8)
1d2: 46c0 nop ; (mov r8, r8)
1d4: 46c0 nop ; (mov r8, r8)
1d6: 46c0 nop ; (mov r8, r8)
1d8: 46c0 nop ; (mov r8, r8)
1da: 681d ldr r5, [r3, #0] ; (Read SysTick)
The SysTick suggested it take 5 cycles.
In theory, since b.n takes 3 cycles and ldr takes 2 cycles, it should always take 5 cycles to run the above codes. However, I constantly found that sometimes it takes 2 extra cycles. I was wondering what are these extra 2 cycles for? Any idea?
Thanks,
Si
I'm no expert, but this is likely to do with the flash-cache built in to many nxp parts. IIRC the flash cache is 16 bytes. When code is read from flash, 16 bytes are read and the cache is filled. When you read something outside of the cache, it has to refill, which takes extra cycles. So, you might try re-aligning your functions, so they both lie on a 16-byte boundary. I think you will then see both functions have the same timing.
Hi Si Gao,
Thank you for your question.
Could you please tell me what the IDE you are using?
You said : I constantly found that sometimes it takes 2 extra cycles.
Could you tell me how you find it takes 2 extra cycles? Please tell me the test method.
Waiting for your updated information.
Have a great day,
Kerry
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi Kerry,
Thanks for replying.
I did not use any IDE: the compiler was gcc-arm, if that’s what you mean. I don’t think it matters anyway: the codes I wrote in the question are disassembly codes.
As shown in the codes, I used “SysTick” for cycle count . We also have an oscilloscope, which helps to double check that SysTick gives the correct cycle count.
Best,
Si
UM10398, Chapter 24 “LPC111x/LPC11Cxx System tick timer (SysTick)”
Hi Si Gao,
Thank you for your updated information.
Could you give me some pictures of your test result, and change some other b.n address do more test, then check the result, any other cycle difference?
b.n address
From the disassembly code, the execute time should be the same.
You also can provide your test code, then I will try to reproduce it on my side.
Have a great day,
Kerry
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi Kerry,
Please find my on-board codes as well as test codes/results as attached.
As you can see in “example_Delay.c”, my code does nothing else than receiving 8 bytes from COM port, call one function (function ‘Normal’ or ‘Delay’, depends on the first byte of the received data) in ‘example_Delay.S’, then send back two SysTick values through the COM port. In ‘example_Delay.S’, function ‘Normal’ and ‘Delay’ are identical. The only thing I did not send you is the ‘library’ I use (i.e. function ‘target_init()/target_uart_rd()/target_uart_wr()’: my colleague wrote this part so I have to ask his permission to share his code. However, you can easily figure out what they do from their names and write your own version to replace them, in case you would like to reproduce my results. Moreover, I think these functions should be irrelevant to our timing issues anyway.
In my test codes (“Program.cs” in C#), if I set ‘delay=true’ (i.e. send 1 to COM port and called function “Delay”), it shows
“delay=True, Start=65536,End=65529,Duration=7”
If I set ‘delay=false’, it shows
“delay=True, Start=65536,End=65531,Duration=5”
From my previous observation, it seems address does play a role in this issue. But it seems much more complicated than “b.n to a further place takes longer to proceed”. Sometimes it is 5 cycles, sometimes 7: I can hardly figure out any pattern. My wild guess would be it has something to do with any instruction lies in an address ended with hex ‘c’, but that guess can be completely wrong.
Sorry for this wordy/messy description. Hope it could help.
Best,
Si
P.S. In case you are wondering, for better demonstration, I changed a little bit in my code so it looks different from what posted before (less ‘nop’s between ‘b.n’ and ‘ldr’), but the problem remains the same.