Alphanumeric LCD displays

lpcware · ‎06-15-2016

Content originally posted in LPCWare by IanB on Tue Sep 16 02:49:18 MST 2014
Just wondering - what's the favourite way of connecting an alphanumeric display?

1. 4-bit bus
2. A fantastic bit of pcb-tracking to get 8 pins from the same port to connect to the display in the right order?
3. A long software routine that uses 8 adjacent pins on the package and send the data bits to three different ports?
4. Use the LPC122x because the pins are in a sensible order?
5. Buy a module with an I2C interface?

I've done two projects with LCD displays and used solution #3 on one of them and solution #4 on the other (both worked)

On a PIC or an Atmel it was easy, as all 8 pins from an 8-bit port were next to each other in the right order!

lpcware · ‎06-15-2016

Content originally posted in LPCWare by IanB on Sat May 21 00:40:20 MST 2016
A long time since I last visited this topic (is anyone still reading?)

Successfully implemented this by adding a 74HC595 to the SPI port. Now I only need 6 pins (MOSI, CLK, D7, E, RS and RW) and they don't have to be in any order.
E connects to the output latch clock of the HC595, and RW connects to OE. Of course the interface is now output-only, so I connect D7 back to the micro so I can read the busy flag.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pacman on Fri Oct 03 03:42:50 MST 2014
On the LPC13xx, you can write through a mask, which I believe you could combine with your open-drain method, in order to avoid mixing bits before writing bytes.
(ldrb destroys the highword of the register, so you'd have to load '1' into bit 8 or 9 at some point).
Instead, since the port mask is 12-bit (bits 0 ... 11), you can write to LPC_GPIO0[].data[mask]; then the write will be masked. To write to all pins, just set mask to 0xfff.
(I haven't thought deeply about this, so I'm not 100% sure it would be possible - just writing something in my sleep at the moment).

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pacman on Thu Oct 02 12:50:07 MST 2014
Yes, it's sad that many instructions have limited access to registers. Unfortunately you can't fit 4 billion instructions into 16 bits very easily, so some of them had to go out the window.

So far I have only one real useful idea; it's in the section "If you're about to run out of registers" in the hints/snippets document, but I'll repeat it here.

        lsls        r0,r1,#4        ;[1] multiply by structure size
        add         r0,r0,r12       ;[1] add array base address
        ldm         r0,{r0-r3}      ;[5] get flags, data-pointer, I/O address and register value
        str         r3,[r2]         ;[2] output new value on GPIO pins

Here I am running out of registers, so I've moved the base address to r12, then I add the address to my index, and form a direct address pointer.
It's not often it's useful, however you *can* benefit from it now and then.

Also, those registers can be used for saving values temporarily instead of using load/store.
store cost only one clock cycle, but load cost 2 clock cycles. That means if you're saving/restoring the values often, you'll be saving a clock cycle each time.

The following instructions accept r0-r12,r14 for all operands: MOV, ADD, CMP, BX, BLX, MSR and MRS.
Remember that you can also use the ADD in place of a subtract, if you negate the register.
Here's some ideas...
MOV: save / restore low registers in high registers.
ADD: add high registers to low registers (like I did in the above snippet)
ADD: subtract high registers from low registers by loading the high registers with negated values
CMP: Compare high registers with low registers (for instance to see if an ending-address is reached)
BX/BLX: Jump to an address saved in a high register
ADD: Jump n bytes forward (using ADD PC,PC,r10 for instance) - useful if you prepare a branch-distance; but this is an advanced topic.
You can use r0-r12/r14 with both MSR and MRS.

If you're running out of registers (or just want to preserve the low registers, so you don't have to save/restore them in your subroutines), you could also use r9 as an increment-value; add this to r10 and compare with r11.

BTW: I just added another document about making a quick Count Leading Zeroes macro/subroutine.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by IanB on Thu Oct 02 00:19:30 MST 2014
Back to the original topic - if using method #2 or #4 (pins in the right order), then I think quite a lot of code could be saved by setting the pins to "pseudo-open-drain" with pull-up. The HD44780 also has its own pull-ups. I just have to check that the rise-time is quick enough -but it's a pretty slow interface.

Pin capacitance = 2.8pF for the LPC, unspecified for the HD44780 - assume 5pF.
Minimum pull-up current 50µA for the HD44780, 15µA for the LPC, total 65µA - therefore rise time = 8.3V/µs - rise time is about 300ns - should be quick enough.

Write 0xFF to the data pins when reading from the display: the display will then be able to pull the pins to Vss, because they are only driven high by the pull-ups. Thus - no need to keep changing GPIOnDIR.

If RS is connected to bit 8, then data is written with Bit 8 high, and commands with bit 8 low.

If RW is connected to bit 9, then to read the busy flag, write 0b1011111111 (0x2F) to the bus, then read back GPIOnDATA and check whether bit 7 is high or low.

To read memory write 0b1111111111 (0x3F), and read back GPIOnDATA.

(Still need to output the E pulse)

Getting 10 GPIO pins in a row is much easier on the LPC122x - the pcb tracking on an LPC11xx would be amazing!

lpcware · ‎06-15-2016

Content originally posted in LPCWare by IanB on Sun Sep 28 08:09:03 MST 2014
Neat!
Any smart ideas on making use of R8-R12 in the Cortex M0? Or do you agree that they might as well have deleted them?

Is there somewhere we should be putting all this stuff, so that anyone else interested in assembler can read it? Instead of under a heading of "LCD DIsplays"?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pacman on Thu Sep 25 11:02:20 MST 2014
Note: You can also use for instance adcs to collect a bunch of results in one (or more) register(s), instead of using sbcs.
That can be useful for table-lookup + jumps. So different comparisons can be mashed into one jump-table.

Those are worth mentioning, yes. I've used them so often that I didn't really think about putting them in the hints/tips/tricks document.
-But I'll update it and perhaps think about a few extras.

And yes; one of the benefits from the UXTH/UXTB instructions is that they don't update the flags; this can be very useful.

I often use lsls and lsrs to test bits both using carry and the N and Z flags.
This might come in handy as well:

    movs r1,#32
    rors r0,r0,r1  /* duplicate bit 31 to the carry flag */
    adcs r2,r2,r2 /* insert as low bit in r2 */

Something I've done a lot is to 'add through a mask' and 'subtract through a mask'.
-Say, you have a GPIO port, where you've connected 4 LEDs, and there are a few pins between each of them.
pin 0, pin 2, pin 5 and pin 7 are connected to each their own LED.
load the port's byte value into r0

    movs r1,#0x5a
    orrs r1,r1,r0
    adds r1,r1,#1

You've now incremented your 'counter'; you'll need to mix the bits, but that's straight-forward using ANDS and ORRS.

If you need to decrement, just use bics instead of orrs.

The startup-code ... I tried to make it as generic as I could. It should work with most linker-scripts.
There's no real difference between LowLevelInit and System_Init.
However, in my startup code, you can override the Reset_Handler, which you can't in most of those pre-fabricated sources (that's why I always use my own).
Feel free to customize the startup-code. After all, it's just a piece of code, which is executed once the MCU starts up.

.thumb_func : I did exactly that before I knew about the .thumb_func!

BTW: This can be very useful...

    subs r1,r0,#1
    lsrs r2,r1,#1
    orrs r1,r1,r2
    adds r1,r1,#1

-What does it do ?
It converts the values...
1 -> 1
2 -> 2
3 -> 4
4 -> 4.
Fun, eh ? - What's the use ?
Well, see it as 'round up to nearest power of two if necessary'.
It can also be used for auto-alignment.
You can extend it by adding another right-shift + or, so it handles values up to 8 or 16.

If using Cortex-M3, you could:

    subs r1,r0,#1
    orrs r1,r1,lsrs#1
    orrs r1,r1,lsrs#2
    orrs r1,r1,lsrs#4
    adds r1,r1,#1

...etc.
It's useful if you need to find a container size for n bytes.

Speaking of which...

Align on an even address if necessary (trivial code, though):

    movs r1,#1
    ands r1,r1,r0
    adds r0,r0,r1

(I better stop, before the server runs out of harddisk space).

lpcware · ‎06-15-2016

Content originally posted in LPCWare by IanB on Wed Sep 24 12:24:02 MST 2014
Thanks Pacman, for bringing us back on topic (and then immediately digressing!)

I like the "programming tips and tricks" article, especially the cunning use of "sbcs".The cycle count in square brackets is the work of someone who really does need to know the time the code takes to execute!

I'll contribute three more (OK - you can tell me that everyone knows these!)

1. AND immediate is sadly lacking from the Cortex M0, which makes testing bits in registers annoying:

LDR R3,=LPC_UART_BASE
LDR R2,[R3,LSR]
MOVS R1,#0b01000000
ANDS R2,R2,R1  (or TST R2,R1)
BEQ/BNE

This does the job with one less cycle (note that it is #7 to test bit 6)

LDR R3,=LPC_UART_BASE
LDR R2,[R3,LSR]
LSRS R2,R2,#7
BCC/BCS

but it really comes into its own when testing bits higher than 7.

2. UXTB R0,R0 and UXTH R0,R0 are synonyms for AND R0,#0xFF and AND R0,#0xFFFF and can move the data to another register at the same time.

3. To truncate to any number of bits, shift left by 32-<number of bits> and then shift right by the same amount.
(one instruction fewer than the AND routine if you have to use LDR R0,=<constant> )

I particularly like the startup code article - I have to admit that I didn't know what .thumb_func did. When the hard-fault error popped up, I realised what it was, and simply added 1 to the values in the branch table.
It did puzzle me when my interrupt routine wouldn't run!

Does that work with the standard (automatically generated) linker script, or is there a linker script that goes with it?
and what is the difference between Low_Level_Init and System_Init ?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pacman on Wed Sep 24 01:23:06 MST 2014

Quote: mch0
Modern compilers (mostly) do a very good job on optimization anyway.

I agree. Remember one thing: Before blaming the compiler for an error, blame [u]yourself[/u].

-Many people blame the compiler for their own mistakes; sadly those people usually never say "I was wrong", which leads other readers to think that the compiler is unreliable.
GCC (up till 4.8.x) is quite robust and I have not seen it generate incorrect code (ever!).
Most people forget the 'volatile' keyword when accessing variables from both interrupt and task-time, which means that the compiler (correctly) avoids reading the variable, when it's been reading the value previously.
Declaring a variable 'volatile' forces the compiler to insert code to read/write the value on each access.
Also: Avoid empty for-loops. They'll be removed when turning -O2 or -O3 on, and you won't understand where your delay routine disappeared to.
If you really 'need' an empty for-loop, use volatile:

for(int32_t i = 0; i < 10000; i++){ volatile uint32_t dummy; (void)dummy; } /* the C way */

...or...

for(int32_t i = 0; i < 10000; i++){ asm volatile("nop"); } /* the "assembler"-way */

...The assembler-way is not as cycle-accurate as writing it in a .s file, though.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pacman on Wed Sep 24 01:09:40 MST 2014
In order to keep a little to the topic, I've made this small Web-site dedicated to low-cost LCD display datasheets.
The SPI module in action:
[img=209x128]http://scratch.gpio.dk/Alpha%20Ladybug-512H.jpg[/img]
(This display is driven by the LPC1768's SSP running at 50MHz, enough to do a 30 FPS update; it's not a touch-screen, BTW).

Since you're interested in Cortex-M0 assembly language, you might want to read a document I once wrote.
I've moved it to the ARM Connected Community, where I've updated it.

When it comes to using hardware on the LPC, I've written my own assembly-language system-header files.
First thing I do, though, is to include a file containing convenience macros; Here's a sample of some of the macros (feel free to be inspired):

RS* are used for creating structures. The offset can be negative or positive, 'grow' forward or backwards.
BD* generates bit-fields and multiple labels for each value; same style as RS.
DS* are convenience/compatibility 'replacements' for .space
FUNCTION/ENDFUNC simplifies function generation.

[list][color=#009]
[*]EQU (like .equ)
[*]MEQU (multiple labels for one value)
[*]SET (like .set)
[*]SECTION (.section .\name,"ax",%progbits)
[*]DSECTION (.section .\name,"a",%progbits)
[*]MOV_32 (movw+movt)
[*]BYTE, B8 (like .byte)
[*]HWORD, B16 (like .hword)
[*]LONG, B32 (like .long)
[*]B64 (like .8byte).
[*]RSSET (argument is the starting offset)
[*]RSRESET (like RSSET 0)
[*]RSEND (argument=symbol to hold structure size)
[*]RS (argument1 = label, argument2 = structure, argument3=optional count (default=1))
[*]RS.B (argument1 = optional label, argument2 = optional count (default=1))
[*]RS.H (like RS.B, for 16-bit words)
[*]RS.L (like RS.B, for 32-bit words)
[*]RS.M (defines multiple labels for 8+16+32 bit access)
[*]RS.O (8-byte)
[*]RS.F (float, 32-bit)
[*]RS.D (double, 64-bit)
[*]RS.P (pointer, 32-bit).
[*]BDSET (argument is the starting bit#)
[*]BDRESET (like BDSET 0)
[*]BD: Bit-Define. (argument1 = optional label, arument2 = optional length (default=1))
[*]ENUM (like .equ but if value omitted, it auto-increments for each use)
[*]SPACE
[*]DS.B (argument = count)
[*]DS.H
[*]DS.L
[*]DS.O
[*]DS.P
[*]DS.F
[*]DS.D
[*]LABEL (generates global label(s)).
[*]FUNCTION (argument(s) = label(s), this is .text+.global+.type+.func+.thumb_func)
[*]ENDFUNC (argument = (primary) label; this is .size+.endfunc).
[/color][/list]

Those make the sources a lot 'cleaner' to look at. I've kept the names uppercase, so it's easy to see that it's a macro.
in addition it makes it easier to port the sources to different assemblers, should the need arise (well, if there is a bug in GAS, the policy is to avoid fixing it, because fixing it may break compatibility with existing sources! -That's the answer I got for a directly incorrect behaviour; also implementing new features, such as nested structure support: Afraid of breaking compatibility. I wonder why they're not afraid of adding new instruction sets for new architectures; they don't want to implement .elseif either, which I believe would be very useful. Also suggested a #define like .equs, assign string to a label, but ...)

Simple example of use (in some cases I add a namespace)...

RSRESET
RS.LLLI.srcAddr
RS.LLLI.dstAddr
RS.LLLI.nextLLI
RS.LLLI.control
RSENDLLI.size

BDRESET;/* I2Sx_DMAx */
BDI2Sx_DMAx_RX_DMAx_ENABLE
BDI2Sx_DMAx_TX_DMAx_ENABLE
BD,6
BDI2Sx_DMAx_RX_DEPTH_DMAx,4
BD,4
BDI2Sx_DMAx_TX_DEPTH_DMAx,4

ENUMCFG_Transparent,0
ENUMCFG_Blue
ENUMCFG_Red
ENUMCFG_Magenta
ENUMCFG_Green
ENUMCFG_MaxColors

RSRESET/* GPIO_PORT, General Purpose Input/Output Ports */
RS.BGPIO.B,256/* Byte pin registers port 0 to 5; pins PIOn_0 to PIOn_31 */
RS.L,960/* (reserved) */
RS.LGPIO.W,256/* Word pin registers port 0 to 5 */
RS.L,768/* (reserved) */
RS.LGPIO.DIR,8/* Direction registers port n */
......

Note: My font for the display is written completely as an assembly-source file. Here's an example on how I use macros for generating bitmap characters:

PIX. . . . . . . .
PIX. . . . . . . .
PIX. . X X X X . .
PIX. . . . . X X .
PIX. . X X X X X .
PIX. X X . . X X .
PIX. . X X X X X .
PIX. . . . . . . .

(Yes, that's the corrected version of the 'a')

-As you see, I'm pretty lazy. I made GAS convert 'X' to 1 and everything else to 0. In addition, I made GAS rotate each character 90 degrees, as the display is vertical, not horizontal.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Mon Sep 22 13:21:28 MST 2014
Hi lanB,

I don't think it's important whether ones C looks elegant or not.
One can even overdo it and then the code becomes quickly unreadable.
Modern compilers (mostly) do a very good job on optimization anyway.
So it doesn't matter whether you use 3 intermediate variables for clarity or not or whether you cram a whole function into a single line or not.

Why do I say so? Give C a try some time (when you have got time) and don't think about fluency or lack of :)

About having to look things up:
Of course one still has to know how to tell the function the parameters. But names like, say, PARITY_EVEN or PARITY_NONE are obviously self explanatory. So in such cases I don't have to look up any more in the UM which bit field controls parity in which register and what values to use.

The time I spend on designing is split up between C (Host and Firmware), VHDL (for FPGAs) and board level designs. But the C part is usually dominating by far.

Mike

lpcware · ‎06-15-2016

Content originally posted in LPCWare by IanB on Mon Sep 22 12:29:10 MST 2014
Thanks Mike.
I started programming ARMs in April this year - I thought for quite a while about whether to improve my C or to write in assembler.
I'm sure that there are many contributors to these fora for whom English is not their first language - my writing C is a like their writing English: it gets the message across, but it's not elegant. However, I'm fluent in assembler. I've written in Basic, Fortran, Forth and C, and still prefer assembler (and I do it properly with the parameters passed in the correct registers, subroutines that preserve R4-R11 etc. etc.)
For the type of software I tend to write (lots of GPIO work, some of it timing critical, and not much heavy data processing), I think I made the best decision, but I'm still thinking about it - that's why I enjoy the debate and seeing other people's opinions.

I think my choice was influenced by the fact that the bastardised ARM instruction set that is the Cortex M0 looks just like Atmel assembly language, which I have used extensively.
If I have to go further up the LPC tree into more capable devices then I might have to change my mind. I would like to program in real ARM code - I might do that for fun!

To answer your question - I have used the LPC timers, UARTs, SPIs, A/Ds and didn't have a problem - my first project used every peripheral on an LPC1114!
Init_Uart() and functions of its ilk sound useful, but it will still involve looking things up in the manual: e.g. are the two UARTs called 0 and 1, or 1 and 2? To set no parity is it 0, or 1, or NO or NONE? Which order are the parameters in? To set the baud rate it must know the clock speed. You're right about the obscure bits - I have to admit to being caught out by CLKDIV registers and bits in the SYSAHBCLKCTRL, but it was my first time using the device - it didn't fool me on my second project!

By the way, do you spend all your time writing software, or do you design hardware as well?

(I think we're now a long way from the original question about LCD displays!)

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pacman on Thu Sep 18 18:47:00 MST 2014

Quote: Pacman
there might be a difference in microseconds

..That should have been nanoseconds. Ehm, well, a fraction of a microsecond is a nanosecond or picosecond, so I guess it doesn't really matter. ;)

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Thu Sep 18 08:53:08 MST 2014
Hi lanB,

I have built my very first "computer" out of TTL chips (e.g. 74181 for the ALU) and my second one was a self-designed 8080 wich had 256 bytes RAM and NOTHING else. I entered my input data by writing it (with switches) into a RAM location and when the program finished (HALT instruction) I looked at the RAM again for the result.
No assembler either, pure and simple machine code.

BUT:
These days are long gone for me, I have fond memories, but that's it.
Once in a blue moon the code the various C compiler spit out does not seem to do things as I would have expected them.
Either things run too slow (and due to my knowledged from these early days I know wat I could expect) or not at all.
For example, on some uCs you may have to write to a register within very few clock cycles after "unlocking" it and if the generated code uses one instruction too much the window has closed again.
In every case I was able to formulate the C code in such a way that the compiler had no choice but do it straightforward. In almost 30 years of C this has happened maybe 20 times, less than one time per year.
On the other side I have reused LOTS of code on different architectures as long as there was a C compiler (and, in the last 15 years, >16bit adressing).

Of course I could program any peripheral in asm. And in order to understand what I can get out of a peripheral in terms of functions or performance, I look at the registers in the manual.
But once I get the idea, I indeed now look for libraries such as the "chip drivers" supplied by NXP.

Once upon a time a timer was s single register you wrote to (pic 8 bit, 8048) and then it counted (mostly) down. Some even could generate an INT when 0. :)

Did you look at the timers of these days? Before such a beast even thinks of starting you end up writing into 30 different registers. Some of them in other modules, like the clock generator chain or the pin muxing. Not to forget some "hidden" reset signals another module can generate for each peripheral.
If you miss one single item it just sits there doing nothing - and it doesn't have a way to tell you why.

And when you use a recent UART you first have to think about which value it wants to see in a fractional divider.

This is the reason i LIKE functions like Init_Uart(uart_number, baudrate, ...).
The manufacturer knows his chip and sets up even the more obscure bits. Of course, I still look at that code to see what it does, mostly out of curiosity, but I like the fact that it is there in the first place - in working order!

Take a look at the SGPIOs (Serial GPIOs) or the SCT (State Configurable Timer) of a LPC43xx for an example.

I do understand your POV and Pacmans, no question asked.
But, as I said, I couldn't afford to do it that way any more- except for the fun value :).

Mike

lpcware · ‎06-15-2016

Content originally posted in LPCWare by IanB on Thu Sep 18 07:02:50 MST 2014
C is probably a very good language for writing applications that run on computers and process lots of data. I don't know because I've never done that. My background is in electronics hardware, not computers; and I find C to be quite cumbersome to do the things microcontrollers do. I also find it rather complex and rather difficult to follow. It calls a subroutine (I think they are called "functions") which calls another subroutine, which calls yet another subroutine, but this one is in a different source file. I'm sure it's done this way as a form of software copy protection.

For typical microcontroller jobs, I find assembler a lot easier to code (after all, there are only about 2 dozen instructions to learn), and a lot more efficient, but I have a lot of experience of writing assembler - I wrote my first program on a Commodore PET and was amazed how fast it ran.

When I first started to design with the ARM microcontrollers, I noted the claims that they could be programmed in C, and thought about learning it to a higher proficiency. I looked at the output of the compiler to see the assembler code that it generated. For a routine to set or reset a port pin, the compiler used 22 instructions, my assembler code needed 3. I am amused that the sales blurb puts the code efficiency of the Thumb instruction set next to the ability to program in C - obviously the two cancel each other out.

I am baffled that the C programmers can't seem to be able to use the peripherals without downloading a driver for them. A Driver? To Check the UART status register and send the data when it is clear? Really?

I'm sure at some time I will have to write a routine that needs C - dealing with USB or Ethernet peripherals, perhaps - but I have interfaced to FAT16/32 formatted memory cards in assembler - it wasn't difficult - I found it easier than understanding how to use the C memory card drivers!

What I'd like is a Cortex M0 sized processor that I can program in ARM code!

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pacman on Wed Sep 17 14:29:29 MST 2014
You're actually right about the "fun" part. I mainly write assembly-language for fun, however, I also use it for real work, where I need to spend a lot of time finding the right solution in making time-critical code.
For instance: You can't make a reliable display-driver in C. Here I'm not talking about a 1602 display or SPI/8-bit/16-bit parallel display.
I'm talking about "driving the scanline" yourself. Sometimes it even matters which 2-clock-cycle instruction you're using, because there might be a difference in microseconds, which distorts the picture if choosing the wrong one.
But yes, fun also is an important ingredient for working with microcontrollers (and by using NXP's LPC series, I do find it's both fun and easy).

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Wed Sep 17 09:42:29 MST 2014
OK, I see.

To some degree I'll label your motivation "fun" and with that I never have a problem.
Did this kind of thing also in the past.

All of which you say is certainly true, including the ROLs and RORs.
Yet nowadays I find myself wandering away even from "C all on my own" and relieing on shipped libraries instead. Which can take some time also, if there's something missing or not working as advertised.

I'm right now using USB, Ethernet and several less complex peripherals in parallel and I would never find the time to write anything remotely useful in ASM any more.

Mike

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pacman on Wed Sep 17 06:14:07 MST 2014

Quote: mch0
I'm totally amazed that you program an ARM in assembler

He's not the only one. :)

Quote: mch0
May I ask, why?

If you asked me, my answer will be: Because it's such a cool instruction-set, and because no C compiler can make cycle-accurate code, when I write time-critical code.

There are plenty of reasons to write in assembly language, when you write ARM code.
For instance, did you know that you can write some Cortex-M0 code, build it and run it on a Cortex-A7 at many times the speed, in order to stress-test it ?

Sure you can do the same from C, but you can't be quite sure your routine will be 100% identical when built on two different platforms.

Apart from that... You have finer control over using the flags, such as Carry, Zero, oVerflow and Negative; plus it's much easier rotating values.
(Sometimes I wonder why there isn't a standard ROR and ROL command or macro in C; those two instructions existed since the early 80's; anyway...)

lpcware · ‎06-15-2016

Content originally posted in LPCWare by mch0 on Wed Sep 17 05:04:45 MST 2014
Hi LanB,

no comment about the code. Just out of curiosity:
I'm totally amazed that you program an ARM in assembler!

May I ask, why?

I'm "in this business" since the days of the 8080 but I've stopped to write in assembler maybe 15 years ago - and at that time it was a DSP where the need for speed would not allow for C.

But nowadays (an that's 10 years in the past) I feel for my projects that development speed is much more crucial than chip cost - in case you could really save a speed or memory grade by using asm.

Is it just for fun (I'd understand that any day :)) or really driven by comercial need?

Mike

lpcware · ‎06-15-2016

Content originally posted in LPCWare by IanB on Wed Sep 17 01:53:35 MST 2014
I started with the idea of using #2, then having found 8 convenient bits discovered one of them was needed for the UART or the SPI.

This is the code that uses pins 17-24 as D0-D7 - shifts data bit into the bit 0 position, ANDs it with 1, and shifts it back to the right position in the GPIO. The routine is interred with the byte to be written in R0. I'm sure there's other ways of doing it - but it makes the 4-bit bus look efficient, but the LPC processor has a 48 times speed advantage over the HD44780.

dispwrite:PUSH {R4,LR}   /* send byte to display */
                LDR R3,=LPC_GPIO0_BASE
MOVS R4,#1
LSRS R2,R0,#5
ANDS R2,R2,R4
LSLS R1,R2,#6
LSRS R2,R0,#6
ANDS R2,R2,R4
LSLS R2,R2,#7
ORRS R1,R2,R1
MOVS R2,#0b11000000
LSLS R2,R2,#2
STR R1,[R3,R2]

LDR R3,=LPC_GPIO1_BASE
MOV R2,R0
ANDS R2,R2,R4
LSLS R2,R2,#9
MOVS R1,#1
LSLS R1,R1,#11
STR R2,[R3,R1]

LDR R3,=LPC_GPIO2_BASE
LSRS R2,R0,#2
ANDS R2,R2,R4
LSLS R1,R2,#4
LSRS R2,R0,#3
ANDS R2,R2,R4
LSLS R2,R2,#5
ORRS R1,R1,R2
LSRS R2,R0,#7
ANDS R2,R2,R4
LSLS R2,R2,#9
ORRS R1,R1,R2
MOVS R2,#0b10001100
LSLS R2,R2,#4
STR R1,[R3,R2]

LDR R3,=LPC_GPIO3_BASE
LSRS R2,R0,#1
ANDS R2,R2,R4
LSLS R1,R2,#4
LSRS R2,R0,#4
ANDS R2,R2,R4
LSLS R2,R2,#5
ORRS R1,R1,R2
MOVS R2,#0b110000
LSLS R2,R2,#2
STR R1,[R3,R2]

LDR R3,=LPC_GPIO0_BASE
MOVS R2,#0b1000
STR R2,[R3,#0b100000]   /* pulse E */
MOVS R0,#1
BL wait
MOVS R2,#0
STR R2,[R3,#0b100000]

POP {R4,PC}
/* ------------------------------------------------------------------------------------ */

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pacman on Tue Sep 16 18:36:25 MST 2014
I've used #2 most of the time (because I've got enough I/O-pins and because it makes shorter read/write routines possible).
The devices I've been using are LPC1768, LPC1751, LPC1788 and LPC1114.
I've been asked to support #1 in my code, because there are people who want to save I/O-pins.
-So I mainly prefer #2 then #1. I'm actually using #3 on a different type of 8-bit parallel display, but the code is not ready yet. :)