Content originally posted in LPCWare by Pacman on Fri Sep 13 23:12:22 MST 2013
Very nice.
I think there might be another small optimization:
ldr r2,=1<<14
That uses 4 bytes + one instruction.
If you load #1 into for instance r7 instead of r0 in the UART init, you could...
lsls r2,r0,#14
Then you'd save those 4 bytes.
It's very handy to always have a zero and a one in a register (if it can be afforded)
I like that you're using tst r0,r0. =)
You could also place this line...
bl uart0Write
...right before 'main' and then branch to it from your ramloader.* subroutines, then you'd save an instruction each time.
(I've seen 6 places this could be done, saving 5 instructions in total)
You could save one more instruction by placing the movs r0,#'A' right before the bl uart0Write, as #'A' is used twice.
The 4 of these in the address-reader...
bl uart0Read
orrs r6,r6,r0
rors r6,r6,r1
...could be reduced by...
getb:orrs r6,r6,r0
rors r6,r6,r1
b uart0Read
...and then calling 'getb' a few times:
bl uart0Read@ LSB
bl getb
bl getb@ MSB
bl getb
(It would be possible to save an extra instruction (movs #0,r6) if the address was transmitted in big endian instead of little endian, but you probably don't want to break compatibility). That could be done by using lsls instead of rors before inserting the byte.
This line in ramloader.execute...
ldr r0,=0xFFFFFFFF
...can definitely also save 2 bytes by...
movs r0,#0
subs r0,r0,#1
...but since r0 is already 0, you only need subs r0,r0,#1
That could be saved too, by changing the bne to a bhi
-So that would be 6 bytes saved in total. ;)
Of course I haven't seen all the possible optimizations; there's probably still plenty possible, and there might be some that you don't want to do (if using this last one mentioned, you'd have to be careful if changing the code).
Byte-reduction optimization is fun; I've done a lot of it when space was tight or I just wanted to reduce some memory-usage to the bare minimum.