Hi Philip
I'm trying to understand how to count cycles for ARM cortex M0+ processors to be able to predict the exact behavior. I would be happy for any comment.
I'm using FRDM-KL25Z board and Kinetis Studio.
I have a code is like this:
int main(void)
{
*/
SIM_SCGC5 |=0x400; //enable clock to port B
PORTB_PCR19 = 0x100; //Make PTB19 as GPIO
GPIOB_PDDR|=0x80000; //Make BTB19 as output pin
while(1)
{
GPIOB_PDOR &= ~0x80000; //turn green led on
GPIOB_PDOR |= 0x80000;
}
return 0;
}
Which compiles to (just the while loop):
while(1)
{
GPIOB_PDOR &= ~0x80000; //turn green led on <Baldur: set pin low>
548: 4b09 ldr r3, [pc, #36] ; (570 <main+0x54>) <Baldur: 2 cycles>
54a: 4a09 ldr r2, [pc, #36] ; (570 <main+0x54>) <Baldur: 2 cycles>
54c: 6811 ldr r1, [r2, #0] <Baldur: 2 cycles>
54e: 4a09 ldr r2, [pc, #36] ; (574 <main+0x58>) <Baldur: 2 cycles>
550: 400a ands r2, r1 <Baldur: 1 cycle>
552: 601a str r2, [r3, #0] <Baldur: 2 cycles>
C:\Users\baldurtho\workspace.kds\TestProject_005\Debug/../Sources/main.c:87 (discriminator 1)
GPIOB_PDOR |= 0x80000;
554: 4b06 ldr r3, [pc, #24] ; (570 <main+0x54>) <Baldur: 2 cycles>
556: 4a06 ldr r2, [pc, #24] ; (570 <main+0x54>) <Baldur: 2 cycles>
558: 6812 ldr r2, [r2, #0] <Baldur: 2 cycles>
55a: 2180 movs r1, #128 ; 0x80 <Baldur: 1 cycle>
55c: 0309 lsls r1, r1, #12 <Baldur: 1 cycle1>
55e: 430a orrs r2, r1 <Baldur: 1 cycle>
560: 601a str r2, [r3, #0] <Baldur: 2 cycles>
C:\Users\baldurtho\workspace.kds\TestProject_005\Debug/../Sources/main.c:90 (discriminator 1)
}
562: e7f1 b.n 548 <main+0x2c> <Baldur: 3 cycles>
564: 40047000 .word 0x40047000
568: 00001038 .word 0x00001038
56c: 4004a000 .word 0x4004a000
570: 400ff040 .word 0x400ff040
574: fff7ffff .word 0xfff7ffff
This code gave a square wave on the oscilloscope of 717 ns High and 618ns Low.
I counted instruction cycles and added to the above in <> brackets
(m0+ instruction set: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0432c/index.html)
To check my clock period I added 10 „nop“s and found it to be about 47.7ns.
So from the code without nops the 717ns is equavilent to 15 cycles and 618 is equavilent to 13 cycles
Low period: Counting cycles from line 552 (pin set low) to line 560 (pin set high) í count (my comments in the <> prackets) 11 cycles when scope said 13.
High period: counting cycles from line 560 to 552 i find 14 cycles but scope says 15
I have read that the m0+ processor has a 2 stage pipeline. Due to it I thought maybe the four 2 cycle instructions in row at line 548 to 54e would actually execute on every cycle and thus my actual program would be faster than the cycles for each instruction. But what I experience is the opposite, my code is slower than predicted.
I did some experiments and found that putting in one "ldr r3, [pc, #36]" instruction (inline assembly) to the c-code prolonged the time by time comparable to two cycles and adding two "ldr r3, [pc, #36]" in a row added four cycles. Could it be that the cycle count is given taking into account the two stage pipeline and expecting linear execution?
The Freescale Kinetis processors are said to have no wait states so that should not cause the slow down.
I was not able to test the length of all the instructions on my assembly output due to a strange problem: when I put in to my c-code the following: "__asm("orrs r2,r1"); I got the message that this instruction was not supported in Thumb16 mode - but I can see the instruction in my assembly above???
When I saw your post I realized that the time difference from my cycle counting to the output measured could be due to bus access. Do you think that could be the case here?
Again, I would be very happy for any comment.
Best regards,
Baldur