Blink LED on LPCXpresso 1769

lpcware · ‎06-15-2016

Content originally posted in LPCWare by swmspam on Thu Mar 08 15:08:58 MST 2012
[FONT=Lucida Console][SIZE=2]I downloaded the examples programs for the LPC1796. The documentation says the GPIO is already enabled for 1769.

However, I can't find an example that clearly shows: [I]how to turn on an LED[/I].

Likewise, how to turn one off.

What include ".h" files are needed?

My intention is to write a program that simply [I]turns on an LED[/I] then enters into an infinite while loop.[/SIZE][/FONT]

lpcware · ‎06-15-2016

Content originally posted in LPCWare by swmspam on Sat Mar 10 19:23:24 MST 2012
[FONT=Lucida Console]Apparently, [COLOR=Purple][B]#define PLL0CFG_Val 0x00050077[/B][/COLOR] [/FONT][FONT=Lucida Console]does the trick and scales 100MHz to 120MHz. With a bit of optimization, the scorecard is:

[/FONT]

[FONT=Lucida Console]STM32F4 (168MHz and FPU)   8.6 seconds
LCPXpresso 1769 (120MHz)  15.1 seconds 
mbed (96MHz)              16.2 seconds[/FONT]

[FONT=Lucida Console]
Obviously, the mbed optimization is better than my tweaked LCPXpresso build. The LCP1769 should be (16.2*96)/120 = 12.9 seconds (normalized to MHz) to match the performance of the mbed build on a MHz basis. Changing the program architecture (moving the code to subroutines or functions), variables (constant, local, global), and -O settings did very little to the LCPXpresso times. Either the mbed compiler is inherently more optimized, or subtle (and probably undocumented) tweaks remain to improve the LCPXpresso performance.

The TI Piccolo is proving to be very difficult to optimize. The TI CCS and ControlSUITE (sample code and device configurations) are a nightmare.

Quote:
#1 Scope CLKOUT as mentioned in last your post

I'm a consultant, and don't have my lab bench on-site![/FONT]

lpcware · ‎06-15-2016

Content originally posted in LPCWare by CodeRedSupport on Sat Mar 10 13:46:15 MST 2012
You should be able to check the variable SystemCoreClock, which CMSIS defines. See the FAQ Changing clock speed of NXP LPC1xxx MCUs for more info.

You might also want to do a search of the forum for "1769 120" or similar (info on how to search in the sticky post at the top of the forum thread list). This subject has come up several times in the past.

I would also suggest being careful of using -O3. Sometimes it doesn't actually provide better performance - as for example it can cause the size of the code to increase beyond the size of the flash prefetch buffer. This will obviously depend on your code, the target MCU and the actual compiler. But in my experience it is a good idea to compare results against -Os / -O2 as well.

Another thing to watch is that if you want to split your main loop out into a separate function like this, then you would be better putting it into a separate source file - otherwise the you may well find that different compilers/different options allow/don't allow inlining.

You also want to watch what code is being generated - as the compiler, given that it can see the values of the const arrays, may in some cases actually calculate the result of your multiplications at compile rather than runtime - making your results pretty meaningless as a comparison of real world performance!

When doing this sort of benchmarking, it is always a good idea to look at the code generated. You can do this either within the debugger, or by disassembling the object/executable. For more details, see:

http://support.code-red-tech.com/CodeRedWiki/DisassObsjExes

Regards,
CodeRedSupport

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Ex-Zero on Sat Mar 10 13:28:17 MST 2012
#1 Scope CLKOUT as mentioned in last your post

See: http://knowledgebase.nxp.com/showthread.php?t=2975

#2: Forum Search (120MHz lpc1769) will show you valid 120MHz settings in system_LPC17xx.c:

http://knowledgebase.nxp.com/showthread.php?t=2673

lpcware · ‎06-15-2016

Content originally posted in LPCWare by swmspam on Sat Mar 10 12:46:41 MST 2012
Just checking ... in the [COLOR=Purple][B]LPC17xx.h[/B][/COLOR] include, the PLL clock frequency locations are defined. How can I be sure it is operating at 120MHz?

#define SCB_BASE_ADDR   0x400FC000
/* Phase Locked Loop (Main PLL0) */
#define PLL0CON        (*(volatile unsigned long *)(SCB_BASE_ADDR + 0x080))
#define PLL0CFG        (*(volatile unsigned long *)(SCB_BASE_ADDR + 0x084))
#define PLL0STAT       (*(volatile unsigned long *)(SCB_BASE_ADDR + 0x088))
#define PLL0FEED       (*(volatile unsigned long *)(SCB_BASE_ADDR + 0x08C))

lpcware · ‎06-15-2016

Content originally posted in LPCWare by swmspam on Sat Mar 10 12:44:45 MST 2012
As suggested, the calculation was moved into a separate subroutine. Also, the working variable was initialized as a global float. I tried several combinations of architecture, the code snippit below is one of the runs. All clocked about 19.5 seconds.

The purpose of this code is to benchmark floating-point matrix multiplications. The benchmark was derived from the original FLOPS benchmarking code, which used [3x3] operations. My application is a state-space system with Kalman filtering, which requires floating-point matrix operations. The DSP library is integer only.

#include "LPC17xx.h"
#include "leds.h"

const float m1[5][5] = { {0.0001, 0.001, 0.01, 0.1, 1},{0.001, 0.01, 0.1, 1, 10},{0.01, 0.1, 1, 10, 100},{0.1, 1.0, 10, 100, 1000},{1, 10, 100, 1000, 10000} };
const float m2[5][5] = { {0.0001, 0.001, 0.01, 0.1, 1},{0.001, 0.01, 0.1, 1, 10},{0.01, 0.1, 1, 10, 100},{0.1, 1.0, 10, 100, 1000},{1, 10, 100, 1000, 10000} };
float m3[5][5];

int matrix(void) {
    int m, n, p;
    long j;
    for(j = 0; j < 100000; j++) {
    for(m = 0; m < 5; m++) {
        for(p = 0; p < 5; p++) {
            m3[m][p] = 0;
            for(n = 0; n < 5; n++) {
                m3[m][p] += m1[m][n] * m2[n][p];
            }
        }
    }
    }
    return 0 ;
}

int main(void) {
    led2_init();
    led2_on();
    while(1) {
        matrix();
        led2_invert();
    }
    return 0 ;
}

lpcware · ‎06-15-2016

Content originally posted in LPCWare by TheFallGuy on Fri Mar 09 14:52:10 MST 2012
Are you prepared to post you code, so we can see if there are optimizations you can make.

Also, I don't know what your algorithm is doing, but have you seen this DSP library for NXP?
www.nxp.com/documents/application_note/AN10913.pdf

lpcware · ‎06-15-2016

Content originally posted in LPCWare by js-nxp on Fri Mar 09 14:34:02 MST 2012

Quote:

Unfortunately the example given by js-nxp is not usable on the lpc17xx

I should know better than say somthing I don't really know. :o

Lesson learned : No uniformity with the LPC series???

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pat on Fri Mar 09 12:33:02 MST 2012
Hi,

I'm also not so confident that your testcode is actually correct to test execution speed.

Let me elaborate a bit what I mean with this:
The next statement is used:
volatile float m3[5][5];

I guess m3 is declared volatile to make the compiler not optimize everything out as the result is not used anywhere. The result of this can be (compiler dependent) that the inner and outer loop of the calculation are not optimised to fully exploit locality of reference. This could explain the differences you see.

One way top overcome this is to put your calculation code in a function outside of the loop so that the compiler (that normally does not optimize over function call boundaries) can optimize the calculation better.

I do realize that floats are used, so the gain might not be that big then when integers are used.

Anyway my 5 cent. :)

lpcware · ‎06-15-2016

Content originally posted in LPCWare by swmspam on Fri Mar 09 12:21:48 MST 2012

Quote:
I doubt what's happening here.
The mBed is not that slow compared to the STM with FPU.

Quote:
Do the same for the other controllers.

Yes, I repeated the action with the STM32F4. The Keil IDE cannot correctly flash the STM without an external link. This is not evident when using the Keil IDE. I only found out by reading the forums. It seems like every MCU has a set of undocumented "tricks" to get them to function properly (except the mbed, which does not use an IDE).

The standing thus far, with the STM optimized, and the LPCXpresso running as described in this thread (as release):

[FONT=Lucida Console]STM32F4 (168MHz and FPU)    8.6 seconds[/FONT][FONT=Lucida Console]
[/FONT][FONT=Lucida Console]mbed (96MHz)               16.2 seconds
LCPXpresso 1769 (120MHz)   19.4 seconds [/FONT]

I don't the the LCPXpresso is functioning properly yet. It should score similar or better to the mbed.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Rob65 on Fri Mar 09 10:38:54 MST 2012
Hm,

I doubt what's happening here.
The mBed is not that slow compared to the STM with FPU.
So either the STM's FPU is terribly slow or the mBed has a very good FPU emulation.

If the STM indeed runs on 168 MHz and the FPU's speed is likewise scaled then the software FP functions in the mBed are better than STM's FPU (an mBed on 178 MHz would result in 11.6s :eek:

Rob

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Ex-Zero on Fri Mar 09 10:03:39 MST 2012

Quote: swmspam
How do I confirm clock speed of the LPCXpresso target?

CLKOUT :eek:

//clkout (P1.27) to cpu clock
    LPC_PINCON->PINSEL3 &=~(3<<22);
    LPC_PINCON->PINSEL3 |= (1<<22);
    LPC_SC->CLKOUTCFG = (1<<8)|(11<<4); //enable and divide by 12

lpcware · ‎06-15-2016

Content originally posted in LPCWare by swmspam on Fri Mar 09 08:53:26 MST 2012
[B][COLOR=Purple]TheFallGuy[/COLOR][/B], thank you.

Changing the status of [COLOR=Purple][B]CMSISv2P00_LPC17xx[/B][/COLOR] to "release" and doing a [COLOR=Purple][B]Build All[/B][/COLOR] was successful. I also engaged [B][COLOR=Purple]Optimize Most -O3[/COLOR][/B]. I don't think I am using any unnecessary libraries to slow performance.

#include "LPC17xx.h"
#include "leds.h"

const float m1[5][5] = { {0.0001, 0.001, 0.01, 0.1, 1},{0.001, 0.01, 0.1, 1, 10},{0.01, 0.1, 1, 10, 100},{0.1, 1.0, 10, 100, 1000},{1, 10, 100, 1000, 10000} };
const float m2[5][5] = { {0.0001, 0.001, 0.01, 0.1, 1},{0.001, 0.01, 0.1, 1, 10},{0.01, 0.1, 1, 10, 100},{0.1, 1.0, 10, 100, 1000},{1, 10, 100, 1000, 10000} };

int main(void) {
    
    led2_init();
    led2_on();

    int j, m, n, p;
    volatile float m3[5][5];
    
    while(1) {
        // Start 100,000 iterations of [5x5] matrix multiplication
        for(j = 0; j < 100000; j++) {
            for(m = 0; m < 5; m++) {
                for(p = 0; p < 5; p++) {
                    m3[m][p] = 0;
                    for(n = 0; n < 5; n++) {
                        m3[m][p] += m1[m][n] * m2[n][p];
                    }
                }
            }
        }
        // End benchmark and indicate with LED2
        led2_invert();
        // Loop and repeat forever
    }
    return 0 ;
}

The execution time dropped to 19.4 seconds. I disconnected the board from the PC and ran from an external power supply to verify code was running independent from debugger.

This is still slower than the mbed performance. How do I confirm clock speed of the LPCXpresso target?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by TheFallGuy on Fri Mar 09 08:32:58 MST 2012
You need to build EVERYTHING release, including CMSIS.
See
http://support.code-red-tech.com/CodeRedWiki/ChangeBuildConfig

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Polux rsv on Fri Mar 09 08:32:05 MST 2012

Quote: TheFallGuy
1. Cheack that you really have set the clock speed to 120MHz
2. Build Release (and not Debug). If you are interested in performance (vs code size), ensure you select the "Optimize Most -O3" option as this will enable some more options that will make your code run faster, but will also make it larger.

Do the same for the other controllers.
And be carefull on micros with external memories, wait states,....

Angelo

lpcware · ‎06-15-2016

Content originally posted in LPCWare by swmspam on Fri Mar 09 08:25:58 MST 2012
1. How do I check clock speed is correct?
2. Won't build release:

make all 
Building target: LPCX176x_cmsis2_systick.axf
Invoking: MCU Linker
....
cannot find -lCMSISv2p00_LPC17xx
collect2: ld returned 1 exit status
make: *** [LPCX176x_cmsis2_systick.axf] Error 1

I can't find a file called "lCMSISv2p00_LPC17xx"

lpcware · ‎06-15-2016

Content originally posted in LPCWare by TheFallGuy on Fri Mar 09 07:57:58 MST 2012
1. Cheack that you really have set the clock speed to 120MHz
2. Build Release (and not Debug). If you are interested in performance (vs code size), ensure you select the "Optimize Most -O3" option as this will enable some more options that will make your code run faster, but will also make it larger.

AFAIK, mbed always build for 'release' (i.e. fully optimised) becuase they have no way of debugging.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Pat on Fri Mar 09 07:08:14 MST 2012
>Why does the mbed (LPC1768 96MHz) execute so much faster than LCPXpresso (LPC1769 120MHz)?
>Any suggestions?

Did you:
a) optimise for speed?
b) which library are you using in LPCXpresso?

This can make a big difference.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by swmspam on Fri Mar 09 06:54:57 MST 2012
Rob65, thank you. I was able to complete my experiment in a few minutes using your advice. I have run benchmarks on the TI Piccolo, STM32, Freescale Coldfire, and mbed module. This gave me the opportunity to measure each MCU's performance and familiarize with the IDE. Thus far, the most difficult has been the LCPXpresso. The others only consumed about 1 hour each to install the IDE, familiarize with operation, try a few examples, write the benchmark, and execute. The LCPXpresso took 2 days. The documentation of the LCPXpresso is more confusing than the others, as example, the confusion to access the example programs.

The benchmark executes 100,000 iterations of [5x5] single-point (32 bit) floating-point matrix multiplications. My application requires a real-time simulation of a state-space system, so this is a tough benchmark.

I try to minimize other CPU throughput to maximize the matrix multiplication speed. Therefore, a blinking LED signals when each execution of 100,000 is complete (then it runs again). I also tried benchmarking this program using the debug.h and fprint statements to output the results to debug screen. I was afraid using debug.h was slowing the CPU, so I wanted to try the LED signal.

#include "LPC17xx.h"
#include "leds.h"
#include <NXP/crp.h>

__CRP const unsigned int CRP_WORD = CRP_NO_CRP ;

const float m1[5][5] = { {0.0001, 0.001, 0.01, 0.1, 1},{0.001, 0.01, 0.1, 1, 10},{0.01, 0.1, 1, 10, 100},{0.1, 1.0, 10, 100, 1000},{1, 10, 100, 1000, 10000} };
const float m2[5][5] = { {0.0001, 0.001, 0.01, 0.1, 1},{0.001, 0.01, 0.1, 1, 10},{0.01, 0.1, 1, 10, 100},{0.1, 1.0, 10, 100, 1000},{1, 10, 100, 1000, 10000} };

int main(void) {
    
    led2_init();
    led2_on();
    
    int j, m, n, p;
    volatile float m3[5][5];
    
    while(1) {
        // Start 100,000 iterations of [5x5] matrix multiplication
        for(j = 0; j < 100000; j++) {
            for(m = 0; m < 5; m++) {
                for(p = 0; p < 5; p++) {
                    m3[m][p] = 0;
                    for(n = 0; n < 5; n++) {
                        m3[m][p] += m1[m][n] * m2[n][p];
                    }
                }
            }
        }
        // End benchmark and indicate with LED2
        led2_invert();
        // Loop and repeat forever
    }
    return 0 ;
}

Interestingly, the results are as follows:

[FONT=Lucida Console]STM32F4 (168MHz and FPU)   12.2 seconds[/FONT]
[FONT=Lucida Console]mbed (96MHz)               16.2 seconds
LCPXpresso 1769 (120MHz)   26.7 seconds
TI Piccolo (80MHz and FPU) 43.7 seconds
Coldfire 5270 (150MHz)     60.2 seconds[/FONT]

Why does the mbed (LPC1768 96MHz) execute so much faster than LCPXpresso (LPC1769 120MHz)? And why is the STM32F4 with FPU not twice as fast than the mbed (i.e. 8 seconds)? I can propose a few reasons:

1. mbed compiler more efficient
2. The include files of the LCPXpresso consume CPU throughput
3. The LCPXpresso debugger consume CPU throughput

Any suggestions?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Rob65 on Thu Mar 08 23:44:42 MST 2012

Quote: swmspam
I downloaded all of the zip files from the NXP website

After installing the LPCXpresso tools you actually have to tick a box not to get the Getting Started Manual on your screen :eek:

What might be confusing is that the manual tells you to check the web by clicking on the "more examples" button where it is so much easier to just browse your own PC :confused:

As written by Zero, check out the LPCXpresso176x_cmsis2.zip proejct archive in the examples folder (you have to browse to the proper directory).
Would be a good idea if NXP/Code Red changed this in the getting started.
They point you to the website where you'll find more advanced examples but these are most likely not the first on you like to try.

Unfortunately the example given by js-nxp is not usable on the lpc17xx. Both the pin connect block (IOCON) and the GPIO block are different. Still, the way of accessing registers on the LPC17xx is the same.

Be sure to get a copy of the user manual (download from the web) and the schematics for your board. From the board schematics you'll see where the LED is connected and the user manual tells you how to control the LED.

If you look in the user manual it will show you there is a register called FIOSET to set bits on the GPIO port and FIOCLR to clear bits. The way this works in CMSIS2 is that you have defined pointers to structures that let you do the same. LPC_GPIO0->FIOSET gives you access to FIO0SET, LPC_GPIO2->FIODIR to FIO2DIR etc.
This works similar for all other peripherals.

He Zero, we should write a manual on this: "The morning after getting started manual" :D

Regards,[INDENT]Rob
[/INDENT]

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Ex-Zero on Thu Mar 08 23:14:46 MST 2012

Quote: swmspam
I downloaded all of the zip files from the NXP website

This samples are included in LPCXpresso :) Did you install LPCXpresso?

http://support.code-red-tech.com/CodeRedWiki/SuppliedExamples