Inline Assembly Help - Cortex M0

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Inline Assembly Help - Cortex M0

4,711 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by moogde on Thu Sep 17 09:05:09 MST 2015
Hey there,


first of all please apologize if my questions are trivial... Its my first time using C with Microcontrollers (which works quite well actually), I'm currently switching to ARM (LPC1115) from the 8-Bit-AVR side and was programming those in Assembler (only).

I am reading the book "The Definitive Guide to the Cortex M0" at the moment from Joseph Yiu and try to define an assembly function within C Code, but the following code (thats suggested in the book) can't be compiled:

__asm uint32_t my_add_e(uint32_t x1, uint32_t x2, uint32_t x3, uint32_t x3)
{
ADDSR0, R0, R1
ADDSR0, R0, R2
ADDSR0, R0, R3
BXLR
}


In the book it says that the first 4 Parameters when calling this function are stored by the compiler within R0-R4 and the return value has to be in r0 (and names some other restrictions in using this technique, too). Is there something different in the syntax of the LPCXpresso Software? The compiler says "expected '(' before 'uint32_t'" right at the first line. What would be the correct way to do this?

I understand that there is the possibility to use the __asm("")-Syntax. I've experimented with this a lot, and partly, it did work. Maybe you could help me with this bit of code here, I'm trying to use inline Assembly to give the shortest possible pulse on a port pin of the Cortex.

uint32_t addr= &LPC_GPIO3->DATA;
uint32_t on= 0x04;
uint32_t off= 0x00;

__asm("str %[on],  %[addr]\n\t"
"str %[off], %[addr]"

:  : [addr] "r" (addr), [on] "r" (on), [off] "r" (off)
);



The compiler gives me the error "Error: r15 based store not allowed -- `str r2,r3'" on that (what does this error mean?). Is there another way to pass Adresses or Data from C-Area to the Assembler-Area - and vice versa - instead of using registers like shown (or at least like I've tried to do it) in this code?

Third question: Is there a way to have a look at the disassembly of the generated code, if the project compiles without errors? I'm coming from the AVR-World and am very interested in how the compiler actually translates this stuff fo Assembly. My aim is to be able to do timing-critical stuff on that platform as I'm used to do with the 8 Bit Controllers.


Thank you very much for your help!
-moogde
0 Kudos
Reply
8 Replies

3,709 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by starblue on Sat Sep 26 00:27:11 MST 2015

Quote: moogde
I think the possibility to explicitly locate variables in some global registers might be important.

In C there is the 'register' keyword, but it is mostly ignored by modern compilers.

Quote:
In ASM I'll just use a few registers for temporarily stuff within functions (these registers are never touched by interrupt routines). So I think this might help here, too, because it means that some "fast access" standard variables are always there to use. They then also could be used to transfer data to and from functions.

To understand how registers are allocated you should read about the ABI (AAPCS, ARM document IHI0042). In fact, on Cortex-M they designed the interrupt handling so that interrupts behave like C functions, so you don't need to do anything special (e.g. on ARM7 you need an assembly wrapper around the interrupt).
0 Kudos
Reply

3,709 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by R2D2 on Fri Sep 25 11:27:26 MST 2015

Quote: moogde
What happens if there is not enough Flash Memory to do it? Will it automatically detect that?)



Yes, you will receive something like:


Quote:
d:/nxp/lpcxpresso_7.9.2_493/lpcxpresso/tools/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe: LPC11_DEMO.axf [color=#f00]section `.text' will not fit in region `MFlash16'[/color]
d:/nxp/lpcxpresso_7.9.2_493/lpcxpresso/tools/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe: LPC11_DEMO.axf[color=#f00] section `.bss' will not fit in region `RamLoc4'[/color]
d:/nxp/lpcxpresso_7.9.2_493/lpcxpresso/tools/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe: region `MFlash16' overflowed by 7932 bytes
d:/nxp/lpcxpresso_7.9.2_493/lpcxpresso/tools/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/bin/ld.exe: region `RamLoc4' overflowed by 2540 bytes



in order to rethink your optimization settings  :)
0 Kudos
Reply

3,709 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by moogde on Fri Sep 25 10:45:26 MST 2015
Wow, okay, THAT is inlining. Got it :D What happens if there is not enough Flash Memory to do it? Will it automatically detect that? (just being curious here)
0 Kudos
Reply

3,709 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by R2D2 on Fri Sep 25 08:10:09 MST 2015

Quote: moogde
Is it possible that the Compiler at higher optimization settings is actually generating the same subroutine five times, when it gets called 5 times in the source code? (To omit all the additional steps for the calls, returns)



Of course it's inlining  :)

See:

http://stackoverflow.com/questions/1626248/does-gcc-inline-c-functions-without-the-inline-keyword
0 Kudos
Reply

3,709 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by moogde on Fri Sep 25 07:53:08 MST 2015
Thanks again for pointing out the right directions!

I'll be trying out some of that stuff. Coming from the assembler, I think the possibility to explicitly locate variables in some global registers might be important. In ASM I'll just use a few registers for temporarily stuff within functions (these registers are never touched by interrupt routines). So I think this might help here, too, because it means that some "fast access" standard variables are always there to use. They then also could be used to transfer data to and from functions.

But of course i'll check if that really makes sense in the end with the disassembly.

I played with the Optimization-Settings a bit (didn't know that those existed yet), and it helped a lot in terms of execution speed, but I also missed some of my function's code at the highest optimization setting in the listing file. Is it possible that the Compiler at higher optimization settings is actually generating the same subroutine five times, when it gets called 5 times in the source code? (To omit all the additional steps for the calls, returns)

(But thats not a real important question, I'll find this out myself probably.)

Thanks a lot so far! I might post other questions when they occur :)
0 Kudos
Reply

3,709 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by starblue on Wed Sep 23 01:10:57 MST 2015

Quote: moogde
1. If I'm writing something like in the example "uint32_t on= 0x04;", why is the Compiler not just using a local register in this function, which would be much faster here, why is he putting this into its RAM?

What is your optimization setting?


Quote:
2. I was using the compiler options to generate a list file, as you suggested. I finally found it on the hard disk's debug directory, but it was not shown on the internal project explorer's debug directory. Why is that?

The project explorer filters out some files, use the Navigator if you want to see everything.


Quote:
3. What is a good / or the "correct" editor to explore the disassembly list file?

Any good text editor, I use Emacs. You could also open it in Eclipse, but beware of very large files, Eclipse may hang due to lack of memory.

Quote:
When I open it with the windows explorer, it opens in Microsoft Visual Studio ;-)

You can change that in the Windows properties of a file.


Quote:
4. What kind of ressources am I allowed to use within the __asm()-Definition? Can I use all the registers? Do everything? (and the compiler takes care that there will be no conflicts with what he does?)

Read the GCC documentation:
https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html


Quote:
6. uint32_t addr= &LPC_GPIO3->DATA; 
gets translated to  
ldrr3, .L28+20
so the compiler seems to be storing this adress as one number in flash and loads it here into r3. But:
LPC_GPIO3->DATA = on;
gets resolved into
ldrr2, .L28+24
ldrr3, .L28+28
and [r2+r3] are then used to store the data. My assumption is that this is the "LPC_GPIO3" Base adress and the "DATA"-Offset of the Adress (am I getting this correct?) My question: Is there a way to write directly to the final adress in C? (like on the &LPC_GPIO3->DATA example?)

No. Immediate values are limited to 8 bits shifted (or rotated?) by an even number of bits, so you can't put an arbitrary address there. Using an address in flash is the standard way to do it on ARM, in assembly you can use the ldr rN,=value pseudoinstruction for that. Some newer processors can also load 32 bits by using two instructions for loading the two 16 bit halfwords, which uses the same memory and usually is faster, because of prefetching / caching (but not always).

That should be explained in the book, BTW (it is in the Definitive Guide to the Cortex-M3/M4).

You can also get the documents from ARM, e.g. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/index.html
0 Kudos
Reply

3,708 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by moogde on Tue Sep 22 12:26:33 MST 2015
Wow, thank you very much, this helped a lot!

First of all let me show you the modified code:

uint32_t addr= &LPC_GPIO3->DATA;
uint32_t on= 0x04;
uint32_t off= 0x00;

LPC_GPIO3->DATA = on;
LPC_GPIO3->DATA = off;

__asm("str %[on],  [%[addr]]\n\t"
"str %[off], [%[addr]]"
:  : [addr] "r" (addr), [on] "r" (on), [off] "r" (off)
);


Please find the measured result this code produces on the hardware attached to this comment.

[img]https://www.lpcware.com/system/files/fasttrigger.jpg[/img]

...the slowness of the first pulse is no wonder if I take a look at what the compiler actually does (the disassembly of the C line "LPC_GPIO3->DATA = on"):

 539 0086 0D4A     ldrr2, .L28+24
 540 0088 0D4B     ldrr3, .L28+28
 541 008a 7968     ldrr1, [r7, #4]
 542 008c D150     strr1, [r2, r3]


So far, so good

Please let me ask you some other questions now:

1. If I'm writing something like in the example "uint32_t on= 0x04;", why is the Compiler not just using a local register in this function, which would be much faster here, why is he putting this into its RAM? Can I suggest or force the compiler to just a register, locally, for this?

2. I was using the compiler options to generate a list file, as you suggested. I finally found it on the hard disk's debug directory, but it was not shown on the internal project explorer's debug directory. Why is that?

3. What is a good / or the "correct" editor to explore the disassembly list file? When I open it with the windows explorer, it opens in Microsoft Visual Studio

4. What kind of ressources am I allowed to use within the __asm()-Definition? Can I use all the registers? Do everything? (and the compiler takes care that there will be no conflicts with what he does?)

5. What if I want to load the adress in the above example directly - within the inline assembler, not just loading it in C, and then passing the register to the assembly routine?
Something like "ldrr3, &LPC_GPIO3->DATA"... You get the idea

6. uint32_t addr= &LPC_GPIO3->DATA; 
gets translated to  
ldrr3, .L28+20
so the compiler seems to be storing this adress as one number in flash and loads it here into r3. But:
LPC_GPIO3->DATA = on;
gets resolved into
ldrr2, .L28+24
ldrr3, .L28+28
and [r2+r3] are then used to store the data. My assumption is that this is the "LPC_GPIO3" Base adress and the "DATA"-Offset of the Adress (am I getting this correct?) My question: Is there a way to write directly to the final adress in C? (like on the &LPC_GPIO3->DATA example?)


I know that this is a lot of questions, I hope its okay! Thank you very much for your help!
0 Kudos
Reply

3,708 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by starblue on Mon Sep 21 02:40:26 MST 2015
The problem with your first example is that it probably is for the Keil compiler. LPCXpresso uses GCC, which doesn't understand the syntax where the whole body of the function is in assembly language.

In the second example I think you need square brackets around %[addr] (look up the STR instruction in the ARM documentation).

You can let the compiler generate an assembly listing by passing it the right options:
https://www.lpcware.com/content/forum/output-assembler-file-when-compile
0 Kudos
Reply