Content originally posted in LPCWare by R2D2 on Sat Nov 07 16:28:29 MST 2015
It depends on your code and compiler...
Use the disassembly to see what's happening:
Sample: Set / clear GPIO3_0
Quote:
[color=#c00]0x00000328: 0x4b03 main+40 ldr r3, [pc, #12] ; (0x338 <main+56>)
0x0000032a: 0x2101 main+42 movs r1, #1
0x0000032c: 0x6059 main+44 str r1, [r3, #4][/color]
[color=#03f]25 LPC_GPIO[3].DATA[1 << 0] = 0 << 0;
0x0000032e: 0x2200 main+46 movs r2, #0
0x00000330: 0x605a main+48 str r2, [r3, #4][/color]
[color=#090]26 LPC_GPIO[3].DATA[1 << 0] = 1 << 0;
0x00000332: 0x6059 main+50 str r1, [r3, #4]
27 LPC_GPIO[3].DATA[1 << 0] = 0 << 0;
0x00000334: 0x605a main+52 str r2, [r3, #4][/color]
28 }
0x00000336: 0xe7f7 main+54 b.n 0x328 <main+40>
0x00000338: 0x0000 main+56 movs r0, r0
0x0000033a: 0x5003 main+58 str r3, [r0, r0]
In main+50 and main+52 you can see the fastest option: one store instruction. That's requiring 2 cycles :)
In main+46 - main+48 there's an additional move instruction to load 0 in r2. So that's requiring 2 (store) + 1 (move) = 3 cycles.
And in main+40 - main+44 an additional load instruction and adding even more cycles.
In general the compiler is trying to reduce instructions and reuse register values. If that's not possible, registers have to be loaded again and more cycles are needed :(