Skip navigation
1 2 3 4 Previous Next

LPC Microcontrollers

59 posts

Built into the LPC55S69 is a powerful coprocessor called the “PowerQuad”.   In this article we are going to introduce the PowerQuad and some interesting use cases.   Over the next several weeks we will look at using some of the different processing elements in the PowerQuad using the “Mini-Monkey” board.   

Figure 1:  NXP PowerQuad Signal Processing Engine

 

The PowerQuad is a dedicated hardware unit that runs in parallel to the main Cortex M33 cores inside the LPC55S69. By using the PowerQuad to work in parallel to the main CPU,  it is possible to implement sophisticated signal processing algorithms while leaving your main CPU(s) available to do other tasks such as communication and IO.    This is a very import use case in distributed sensor systems and the Industrial Internet of the Things (IIOT).  Over the next several weeks, I am going to show some practical aspects of using the PowerQuad in some various applications.     I feel it is a very good fit for many tightly embedded applications need a combination of the general-purpose processing, IO, and dedicated signal processing while maintain a very low active power profile.

 

Embedded Systems, Sensors and Signal Processing

 

Before we get started, I think it is helpful to review some concepts and explain why some of the functions of the PowerQuad are useful.   Even though many engineers may have learned about Digital Signal Processing (DSP)  in college or university, there is often little connection to real hardware and code.    Many introductions to DSP begin with formal explanations (i.e. heavy math!).   While this formalism is important for developing the underlying algorithms, it is easy to get lost when trying to make something work.   As an example, one of core algorithms to many DSP applications is the Fast Fourier Transform.  It can be difficult for one to understand to how use at a black box level software if all you have ever worked with was the mathematical formalism.    Being able to link the formalizing with real application is where real magic can happen!  In these upcoming articles, I will break down what is actually happening in the code so it is a bit easier to use the PowerQuad hardware.

 

For an overwhelming majority of sensor and industrial IOT applications, we encounter “time series” data.  By time series, all we mean is that we take some sort of measurement at a constant interval and put the recorded data into a bucket.      We might process this data one sample at a time as it comes in or wait to our bucket fills up to a level before working with the information.  A key feature here is that we have some measurement (temperature, pressure, voltage level) that is captured fixed rate.     What we end up with is a data set that spans some amount of “time”.   We do not have infinite resolution in the measurement “amplitude”  nor can we take measurements infinitely fast. For example, if we take voltage readings over time, our “step” size might be 1milli-second with 1milli-volt resolution in our amplitude.  The details of how fast and with how much precision is application dependent.  

 

Figure 2:   A Time Series Cartoon

 

In Figure 3,   notice that the "dots" are not connected to indicate that we have a discrete set of data.   Many times we fill in the space between the dots on a chart to get a better visualiztion of the signal but what we have to work with is a discrete bucket of data.

 

Let’s take a look at an example using the LPC55S69 on the “Mini-Monkey”.  The Mini-Monkey circuit has a digital microphone connected via an I2S interface to the MCU and a 240x240 pixel display connection via SPI.     Using the display, we can visualize the time series (my voice).  As a demonstration,  I grabbed of a bucket of 256 samples from the microphone via the I2S interface and rendered raw time series data on the display.       The microphone on the Mini-Monkey (Knowles Acoustic SPH0645LM4H-B) was setup to output data at a rate of 32KHz.   The resolution in amplitude from this device is 18-bits.   Since my OLED screen is 240 pixels high, I divided down the amplitude of the samples so they would fit. 

 

Here is an animated .gif of the result:

 

TimeSeriesGIF

 

A video with corresponding audio:

 

 

All I am doing is collecting data into "buffer" and then continually displaying the information on the screen.  It is an easy way to visualize what is going on.  Now, instead of a using microphone measuring acoustic pressure, you could sample something else.  A velocity measurement, a voltage signal, etc.   The time series data set is your starting point.    Now it is time to start doing something with the numbers and that is where PowerQuad can help.    Most signal processing algorithms boil down to simple, repetitive operations over arrays of data.   Just about everything can be boiled down to a multiplication and add.  This is why you may have heard quite a bit about multiply and accumulate units (MAC) in DSP engines.  It is a ideal use case for a coprocessor.

 

The PowerQuad at its core has the logic to handle the most common “building blocks”.    Sometimes when you have a time series, you process the data in a manner to preserves all of the “time information”.  Meaning, the get information out the “signal processing black box” that is still a set of datapoints correlated to some block of time.   They just might be filtered or modified in some way.    For example,  maybe you have a a signal where you want to remove 60Hz noise.    You might consider a digital FIR or IIR filter.   Other times you “transform” your data into information that is “correlated” to something else, such as a rate or “frequency”.   We will be exploring both of these application in future articles but the PowerQuad help with both of these use cases. 

 

LPC55S69 PowerQuad Application - Machine Condition Monitoring

 

The LPC55S69 can bring in time series data via several interfaces.    In this article I measured acoustic pressure with a digital MEMs microphone over a digital audio port (I2S).  You could also take measurements with the analog to digital converter.    For example, I have a little breakout board for an ADXL1001BCPZ accelerometer I built last year:

 

Figure 4: ADXL1001BCPZ Accelerometer Board (Left)

 

This ADXL1001BCPZ is high bandwidth accelerometer useful for machine monitoring and vibration analysis applications.   Many common MEMS accelerometers do not have a high enough bandwidth to capture all the dynamic information in a vibrating system.    The -3dB bandwidth of the ADXL1001 stretches to 11KHZ, making it ideal of vibration problems.  Low-cost accelerometers used for simple motion detection and orientation have a very low bandwidth and may not be able to capture the dynamics you are looking for in a vibration application.   Furthermore, many of the MEMs device that can measure in multiple axis do not have the same bandwidth and noise performance on all axes.    We can use the internal ADC in the LPC55S69 to sample the accelerometer over time and build up a time series to understand how something is vibrating.    While microphones can pick up sound traveling in air, accelerometers can be used to understand sound traveling through a physical structure.  Using signal processing techniques, we even combine information from multiple sensors (measuring the same thing in different ways) to better understand a problem.

 

In the neck of the woods where I grew up, there were lots of experienced auto mechanics who could quickly identify problems without even opening the hood. The first method to debug a problem was to take the car for a drive or start the motor and “listen”.   Many of these individuals were well trained could know exactly what an issue was is simply by listening.    All mechanical systems vibrate.    *How* they vibrate is dependent on their size, shape, material properties, and operating conditions.     These mechanical vibrations couple to the air and we can “hear” what is going on.     If you have some situational awareness of the mechanical system, you know how something *should* sound when the system is operating normally.      If a component starts failing, the mechanical system changes and it will vibrate differently.     Because the “boundary conditions” of the system changed, the nature of the sound produce changes.       We can instrument the machine with sensors, say an accelerometer, and capture the time series.    Using some math (DSP) and our a-priori knowledge of how the system is supposed to behave, it is possible to see predict failure before it occurs.

 

Our global industry is driven by large and expensive electro-mechanical machines.    All the things we consider essential for life, say Oreo cookies and toilet paper, are produced in large factories with large, high dollar value processes.    It makes absolute sense to automate the measurement and analysis of high value machines in as the money saved from unplanned downtime is incredible.       The LPC55S69 can be a good fit for many “smart sensor” applications as it can be packed in tight spaces, consume little power and be able to do a 1st level data reduction at the sensor.   Instead of transmiting large amounts of data from a system, the LPC55S69 can allow for significant signal processing to reduce a complex time series into other metrics that can be analyzed at an enterprise level to determine if a failure will occur.     The LPC55S69 with the PowerQuad is a great fit for the Industrial IOT. 

 

LPC55S69 PowerQuad Application - Power Line Communications and Metering

 

A completely different but interesting use-case for the LPC55S69 PowerQuad is Power Line Communications (PLC).   There are many sensor applications where you need to transmit and receive data, but you only have access to DC or AC power lines.     Many new smart meters attached to you home employ this technology.   PLC uses sophisticated techniques such as Orthogonal Frequency Division Multiplexing (OFDM) to transmit data on a power line.    OFDM is an interesting technique as it allows you to send data bits down a communications channel *in parallel* across several frequency bands.     It is tolerant to noise as you can achieve high bit rates by using many parallel channels/bands where each band contains slowly moving data.

 

A core requirement of any OFDM solution is being able to compute Fast Fourier Transforms (FFT) in real time on an incoming time series.     If you can efficiently compute an FFT, it is straightforward to encode/decode data on both the transmitting and receiving ends of the system.          Using bins of the FFT, data is encoded using the real and imaginary components (amplitude and phase) to make up bits of a data "word".    Once you encode data in the "bins", you can use an inverse FFT to get a time signal to output to a digital to analog converter.     Decoding is essentially figuring out when you signal starts and then using an FFT to get the "bins".   Once you have your frequency bins, you look at amplitude/phase information to reconstruct your data word.

 

Figure 5:   OFDM Time Series,  Frequency Domain Symbol Spectrum and QAM Symbols.

 

This is a gross simplification of the OFDM process but accelerators such as the PowerQuad are a key element to making it work     The LPC55S69 is well suited to this particular application as most of the complexities of the algorithm could be implemented using the PowerQuad leaving your computational resources (such as the Cortex M33) to implement your metering and measurement application.   All of this can be done while consuming very little active energy in a small package.  At one time, you would have needed a power-hungry IC to perform this process.

 

Moving Forward with the PowerQuad

 

I hope you are now interested in some of use cases of the LPC55S69 and the PowerQuad engine.   In the coming articles we can going to dive into some of the different aspects of the PowerQuad engine and demonstrate some processing on the Mini-Monkey platform.    Stay tuned and feel free to check out the LPC55S69.     And in case you missed it, here are some other LPC55S69 blogs/videos:

 

https://community.nxp.com/community/general-purpose-mcus/lpc/blog/2020/01/22/lpc55-mcu-series-there-s-a-lot-under-the-hood-part-1-of-3

 

https://community.nxp.com/community/general-purpose-mcus/lpc/blog/2020/02/05/lpc5500-series-theres-a-lot-under-the-hood-part-2-of-3

 

https://community.nxp.com/community/general-purpose-mcus/lpc/blog/2020/02/20/lpc5500-series-theres-a-lot-under-the-hood-part-2-of-3-programmable-logic-and-rom-boot

 

https://community.nxp.com/community/general-purpose-mcus/lpc/blog/2020/03/13/mini-monkey-part-1-how-to-design-with-the-lpc55s69-in-the-vfbga98-package

 

https://community.nxp.com/community/general-purpose-mcus/lpc/blog/2020/03/29/mini-monkey-part-2-using-mcuxpresso-to-accelerate-the-pcb-design-process

 

https://community.nxp.com/community/general-purpose-mcus/lpc/blog/2020/04/19/lpc55s69-mini-monkey-build-update-off-to-fabrication

 

https://community.nxp.com/videos/9003

 

https://community.nxp.com/videos/8998

In this blog, Mark Dunnett of embeddedpro puts the LPC55S16-EVK to the test ... How fast does it go? How much current does it take?

 

Read more here > LPC55S16-EVK: how fast does it go? How much current does it take? | MCU on Eclipse 

Mark Dunnett of embeddedpro shares his first impressions about the new LPC551x/S1x MCU family - our latest addition to the LPC5500 MCU series - complete with an unboxing experience. 

 

Check it out here > NXP LPC55S16-EVK: unboxing and first impressions | MCU on Eclipse 

The Mini-Monkey is now officially “out the door”.   I just sent the files to Macrofab and can’t wait to see the result.   Before I talk a bit about Macrofab, we will look at what going to get built. A few weeks ago, I introduced a design based upon the LPC55S69 in the 7mm VFBGA98.   The goal was to show that this compact package can be used with low cost PCB/Assembly service without having to use the more expensive build specifications. The Mini-Monkey board will also be used to show off some of the neat capabilities of the PowerQuad DSP engine in future design blogs.    Here is what we ended with for the first version:

Figure 1.  Mini-Monkey Revision A

 

Highlights

  • Lithium-Polymer battery power with micro-USB Charging
  • High-speed USB 2.0 Interface
  • SWD debug via standard ARM .050” and tag-connect interface
  • Digital MEMs microphone with I2S Interface
  • 240x240 1.54” IPS Display with HS-SPI interface
  • Op-amp buffer for one of the 1MSPS ADC channels
  • 3 push buttons.  One can be used to start the USB ROM bootloader
  • External Power Input
  • 16MHz Crystal
  • 11 dedicated IO pins connected to the LPC55S69.   Functions available:
    • GPIO
    • Dedicated Frequency Measurement Block
    • I2C
    • UART
    • State Configurable Timers (Both input and output)
    • Additional ADC Channels
    • CTIMERs
  • The HS-SPI used for the IPS display is also brought to IO pins

 

I am a firm believer in not trying to get anything perfect on the 1st try.    It is incredibly inexpensive to prototype ideas quickly so I decided to try to get 90% of what I wanted in the first version.   As we will see, it is inspesive to iterate on this design to work in improvements.    Without too much trouble,    I was able to get everything I wanted on 2 signal layers with filling in a power reference on the top and bottom sides.  If this was a production design, I would probably elect to spend a bit more to get two solid inner reference planes by using a 4-layer design.     Once a design hits QTY 100 or more, the cost of using a 4-layer stack-up can be negligible. A 4-layer stack-up makes the design much easier to execute and compliant with EMI, RFI requirements.      For most of my “industrial” designs where I know that it won’t be high quantity, I always start at 4-layer unless it is a simple connector board.    

 

For this 1st run, I wasn’t trying to push the envelope with how much I could get done with low cost design rules and a 2-layer stack-up. The VFBGA leaves quite a bit of space for fanning out IO.  Quite a bit can be done on the top layer without vias.      I had a few IO that ended up in more difficult locations, but routing was completely quickly.

 

Figure 2.  Mini-Monkey VFBGA Fanout

 

As you can see, I did not make use of all the IO.       If I had used a 4-layer board I would be simpler to get quite a bit more of the IO fanned out.       Moving to smaller vias, traces and a 4-layer stack-up would probably allow one to get all IO’s connected.   For this design,  I was trying to move quickly as well as use the standard “prototype” class specs from Macrofab.    This means 5 mil traces, 10 mil drills with a 4-mil annular ring.  If you can push to 3.5mil trace/space,  NXP AN12581 has some suggestions.

 

I did want to take a minute to talk about Macrofab.     I normally employ the services of a local contract manufacturer but this time I elected to this online service a try.     After going through the order process, I must say I was thoroughly impressed!       The 1st step is to upload your PCB design files.  I use Altium Designer PCB package and Macrofab recommends uploading in OBD++ format.   Since this format has quite a bit more meta-data baked than standard Gerbers, the online software can infer quite a bit about your design.

 

Figure 3.  Macrofab PCB Upload

 

The Macrofab software gives you a cool preview of your PCB with a paste mask out of the gate.  Note that this design is using red solder mask as that is what is included in the prototype class service.  Once you have all the PCB imported, you can now upload a Bill of Materials (BOM).

Figure 3.  Macrofab BOM Upload

 

Macrofab provides clear guidance on how to get your BOM formatted for maximum success.      Once the BOM is uploaded, the online tool searches distributors and you can select what parts you want to use.   The tool also allow one to  leave items as Do No Place (DNP).       I was impressed that it found almost everything I wanted out of the box.   Pricing and lead time are transparent.

 

Next up is part placement:

 

Figure 4.  Macrofab Part Placement

 

Using the ODB++ data, the Macrofab software was able to figure out my placements.   I was thoroughly impressed with this step as it was completely automatic.      The tool allows you to nudge components if needed.    Once placements are approved, the tool will give you a snapshot of the costs.

 

 

 Figure 5.  Cost Analysis and Ordering

 

What I liked here was how transparent the process was.    Using the prototype class service, a single board was $152.  This is an absolute steal when you consider that all the of the setup costs, parts and PCBs are baked in. If you consider the value of your time, this is an absolute no brainer.    I also like that it gives you a cost curve for low volume production.      In the future, I am going to have a hard time using another service that can’t give me much data with so little work.        

 

I ended up ordering 3 prototype units.  Total cost plus 2-day UPS shipping was $465.67.      Note, I did end up leaving one part off the board for now:  the 1.54” IPS display.     This part requires some extra “monkeying” around as it is hot bar soldered and needs some 2-sided tape.    I decided to solder the 1st three prototypes on my bench to get a better feel for the process of using this display.  However, I am more than happy to push the BGA and SMT assembly off to someone else.

 

It looks like board are going to ship on the 1st of May.  I’ll post a video and update when they come in.  So far, the experience with Macrofab has been quite positive and I am eager to see the results.  Once I get the design up and running, I’ll post documentation to bitbucket.

In part two in this series on designing with the LPC55S69 VFBGA98 package,  I am going to show you how to use the NXP MCUXpresso SDK tools to help with physical design process.    Combining some features in MCUXpresso with my PCB tool of choice, Altium Designer, I can significantly reduce the time in the CAD process.

 

The first step in designing a PCB with a new MCU is to add the part into your component libraries.      Component library management can a source of passionate disagreements between design engineers.      My own view on library management is rooted in many years of making mistakes!  These simple mistakes ultimately caused delays and made projects more difficult than they needed to be.   Often time these mistakes were also driven by a desire to "save time".   Given my experience, there are a few overarching principles I adhere to.

 

  1. The individual making the component should also be the one who has to stay the weekend and cut traces if a mistake is made. This obviously conflicts of the “librarian/drafter” model but I literally have seen projects where the librarian made a mistake on a 1000+ pin BGA that cost >$5k.  This model was put in a library and marked as “verified”.         The person making the parts needs some skin in the game!     In this case, the drafting teams claimed they had a processing that included a double check but *no one in that process knew they context on how the part was going to be used*.     
  2. Pulling models from the internet or external libraries is OK as a starting point but it is just that,  A starting point. You must treat every pin as if it was wrong and verify. Since many organizations have specific rules on how a part should look,  you will need to massage the model to meet your own needs.   Software engineers shake their head at this rule.  "Why not build on somebody else's libraries?   It is what we do!".     Well,    A mistake in a hardware library can take weeks if not months to really solve....  The cost, time and frustration impact can be huge.   We hardware engineers can't simply "re-compile".   
  3.  I don’t trust any footprint unless I know it has been used in a successful design.  The context of how a part is used is very important (which leads to #4).
  4. I believe the design re-used is best done at a schematic snippet level, not an individual part.   After all,   once I get this Mini-Monkey board complete,  I will never again start with just the LPC55S69.  I want all the “stuff” surrounding the chip that makes it work!

 

To the casual observer,  these principles seems onerous and time consuming but I have found that the *save me time over the course of the project*.  Making your own parts may seem time consuming but it *does not have to be*.     There are tools that can make your life simpler and the task less arduous.        Also making your own CAD part is  useful for a few other reasons:

 

  1. You have to go through a mental exercise when looking at each of the pins. It forces you brain to think about functionality in a slightly different way.      When starting with a new part/family, repeated exposure is a very good way to learn.
  2. Looking at the footprint early on gets your brain in a planning mode for when you do get started.

 

One could argue that this is “lost” time as compared to getting someone else to do the CAD library management it but I really feel strongly that it saves time in the long run.     I have witnessed too many projects sink time into unnecessary debugging due to the bad CAD part creation.   I feel the architect of the design needs to be intimately involved and take ownership of the process.

 

The LPC55S69 in the VFBGA package has only 98 pins.    With no automation or tools, it would not take all that long build a part right from the datasheet.   However, it is on the edge of being a time consuming endeavor.     Also,   when I build schematic symbols, I tend to label the pins with all possible IO capabilities allowed by the MCU pin mux.  This can make the part quite large but it also helps see what also is available on a pin if I am in in a debug pinch.       Creating pins with all this detail can be quite time consuming.     I use Altium Designer for all of my PCB design and it has some useful automation to make parts more quickly.   NXP’s MCUXpresso tool also has a unique feature that can really help board designers get work done quickly.

 

Creating the Pin List

 

Built into MCUXpresso is a pins tool that is *very* useful in large projects with setting up the pin mux’s and doing some advanced planning.    While it is primarily a tool for bootstrapping pin setup for the firmware, It can also use useful to drive the CAD part creation process.       Simply create a new project and start the pins tool:

 

 

The pins tools gives you a tabular and physical view of pin assignments.   Very useful when planning your PCB routing.    We will use the export feature to get a list of all the pins, numbers and labels.

 

 

The pins tool generates a CSV file that you can bring into your favorite editor. Not only do I get the pin/ball numbers,   I get all of the IO options available via the MCU pin mux. 

 

 

 

Using the Pin List To Generate Component Pins

 

 With just a few modifications, I can get the spreadsheet into a format useful for the Altium Smart Grid Paste Tool.

 

 

Altium Designer requires a few extra columns of meta-data to be able import the data into a grouping of pins in the schematic library editor.   At this point you could group the pins to your personal preference.  I personally like to see all pin function of the schematic but does create rather large symbols.         The good news here is that by using MCUXpresso and Altium you can make this a 10-minute job, not a 3 hour one.  Imagine going through the reference manual line by line!

 

 

 

 

 

Viola!  A complete symbol.     It just took a few minutes of massaging to get what I wanted.     Like I stated previously, a 98 pin package is not that bad to do manually but you can imagine a 200 or 300 pin part (such as the i.MX RT!) 

 

The VFBGA package is 7mmx7mm with a 0.5mm pitch.    There are balls removed from the grid for easier route escaping when use this part with lower cost fabrication processes.

 

 

Once again,   with a quick look at NXP documentation and using the Altium IPC footprint generator,   we can make quick work of getting an accurate footprint.

 

 

 

The IPC footprint generator steps you through the entire process.  All you need is the reference drawing.   

 

A quick note about the IPC footprint tool in this use case.   The NXP VFBGA has quite a few balls removed to allow of easier escaping.     The IPC footprint generator can automatically remove certain regions, I found that this particular arrangement needed a few minutes of hand work to delete the unneeded pads given the unique pattern.

 

By using Altium and NXP’s MCUXpresso tool together, I was about to get my CAD library work done very quickly.   And because I spent some time with the design tools,   I became more familiar with the IO’s and physical package.   This really helps get the brain primed for the real design work.

 

 

 

At this point in the proces I have a head start on the schematic entry and PCB layout.     Next time we are going to dive in a bit to see what connections we need to bootstrap the LPC55S69 to get it up and running.    We will take a look at some of the core components to get the MCU to boot and some peripheral functions that will help the Mini-Monkey come alive!    

Now that we have discussed the LPC5500 series at a high level and investigated some of the cool features,  it is time to roll up our sleeves work on some real hardware.    In this next series of articles, I want to step through a simple hardware design using the LPC55S69.   We are going to step a bit beyond the application notes and going through a simple design using Altium Designer to implement a simple project.  

 

Many new projects start with development boards (such as the LPC55S69-EVK) to evaluate a platform and to take a 1st cut at some of the software development work.      Getting to a form-factor compliant state quickly can just as important as the firmware efforts.      Getting a design into a manufacturable form is a very important step in the development process.  With new hardware, I like to address all of my “known unknowns” early in the process so I almost always make my own test PCBs right away.  The LPC5500 series devices are offered in some easy to use QFP100 and QFP64 packages.      Designers also have the option of a very small VFBGA98 package option.     Many engineers flinch when you mention BGA, let alone a “fine pitch” BGA.     I hope to show you that it is not be bad as you may think and one can even route this chip on 2 layers.

 


Figure 1.  The LPC55S69 VFBGA98 Package. QFP100 comparison on the bottom.

 

The LPC55S69 is offered at an attractive price but packs a ton of functionality and processing power into a very small form-factor that uses little energy in both the active and sleep cases.     Having all of this processing horsepower in a small form-factor can open new opportunities.  Let’s see what we can get done with this new MCU.

 

The “Mini-Monkey” Board

 

In this series of “how to” articles, I want to step through a design with the LPC55S69 in the VFBGA and *actually build something*.   The scope of this design will be limited to some basic design elements of bringing up a LPC55S69 while offering some interesting IO for visualizing signal processing with the PowerQuad hardware.      Several years ago, I posted some projects on the NXP community using the Kinetis FRDM platform.   One of the projects showcased some simple DSP processing on an incoming audio signal.

 

https://www.youtube.com/watch?v=Nn7DweR--Po&list=PLWM8NW5LEukhCAvE7voge_-L8waDyQSgo&index=3&t=1s

 

The “Monkey Listen” project used an NXP K20D50 FRDM board with a custom “shield” that included a microphone and a simple OLED display.       For this effort I wanted to do something similar except using the LPC55S69 in the VFBGA98 package with some beefed-up visualization capabilities.       There is so much more horsepower in the LPC55S69 and we now have the potential to do neat applications such as real time feature detection in an audio signal, etc.        Also given the copious amounts of RAM in the in the LPC55S69, also wanted to step up the game a bit in the display.     The small VFPGA98 package presents with an opportunity to package quite a bit in a small space.  So much has happened since the K20D50 hit the street!

 

I recently found some absolutely gorgeous IPS displays with a 240x240 pixel resolution from buydisplay.com.   They are only a few dollars and have a simple SPI interface.  I wired a display to the an LPC55S69-EVK for a quick demonstration:

 

   Figure 2:  The LPC55S69EVK driving the 240x240 Pixel 1.54” IPS display.

 

It was difficult for me to capture how beautiful this little 1.54” display is with my camera.  You must see it to believe it!    Given the price I figured I would get a boxful to experiment with for this design project!

 

Figure 3:   240x240 Pixel 1.54” IPS display from buydisplay.com

 

The overarching design concept with the “mini-monkey” is to fit a circuit under the 1.54” display that uses LPC55S69 with some interesting IO:

 

  • USB interface
  • LIPO Battery and Charger circuitry
  • Digital MEMs microphone
  • SWD debugging
  • Buttons
  • Access to the on-chip ADC

 

I want to pack some neat features beneath the screen that can do everything the “Monkey Listen” project can, just better.    With access to the PowerQuad, the sky is the limit on what kinds of audio processing that can be implemented.  The plan is to see how much we can fill up underneath the display to make an interesting development platform.    I started a project in Altium designer and put together a concept view of the new “Mini-Monkey” board to communicate some of the design intent:

 

Figure 4:  The “Mini-Monkey” Concept PCB based upon the LPC55S69 in the VFBGA98 package

 

While this is not the final product, I wanted to give you an idea of where I was going.      The “Mini-Monkey” will be a compact form fact board that can be used for some future articles on how to make use of the LPC5500 series PowerQuad feature.   There will be some extra IO made available to enable some cool new projects to showcase the awesome capabilities of the LPC55S69.    Got some ideas for the "Mini-Monkey"?    Leave a comment below!

 

In the next article we will be looking at the schematic capture phase and how we can use NXP’s MCUXpresso SDK to help automate some of the work required in Altium Designer.     I will be showing some of the basic elements to getting an LPC55S69 design up and running from scratch.      We will then look at designing with the VFBGA98 package and get some boards built.   I hope I now have you interested so stay tuned.   In the meantime, checkout this application note on using the VFBGA package on a 2-layer board:

 

https://www.nxp.com/docs/en/application-note/AN12581.pdf

I recently wrote about the ample processing capabilities built into the LPC55S69 MCU  in addition to the Dual USB capabilities and large banks of RAM.  Now it is time to explore some peripherals and features that are often overlooked in the LPC family but are very beneficial to many embedded system designs.

 

The State Configurable Timer

 

An absolute gem in the LPC family is the “State Configurable Timer” (SCT).      It has been implemented in many LPC products and I feel is one of the most under-rated and often misunderstood peripherals.    When I first encountered the SCT, I wrote it off as a “fancy PWM” unit.   This was a mistake on my part as the SCT is an extremely powerful peripheral that can solve many logic and timing challenges.     I have personally been involved in several design efforts where I could remove the need for an additional programmable logic device on a PCB by taking advantage of the SCT in an LPC part.  At its core, the SCT is a up/down counter that can be sequenced with up to 16 events.   The events can be triggered by IO or by one of 16 possible counter matches.   An event can then update a state variable, generate IO activity (set, clear, toggle), or start/stop/reverse the counter.

 

Consider an example which is similar to a design problem I previously used the SCT for.

 

Given a 1 cycle wide Start input signal


i.) Assert a PowerCtrl signal on the 3rd Clk cycle after the start.
ii.) After 2 Clk cycles the assertion of PowerCtrl, output exactly 2 pulses on the Tx output pin at a programmable period.
iii.) 5 Clk cycles after ii.), de-assert PowerCtrl
iv.) After 2 Clk cycles of the de-assertion of PowerCtrl, output a 1 cycle pulse to the Complete pin.

 

 

 

This task could be done in pure software if the incoming CLK was slow enough.    Most timer/counter units in competing MCUs would not be able to implement this particular set of requirements       In my use case (an acoustic transmitter), I was able to implement this completely in the SCT with minimal CPU intervention and no external circuitry.     This is a scenario where I might consider an external CPLD or FPGA but the SCT would be more than capable of implementing the behavior.    I highly recommend grabbing the manual for the LPC55 family and read chapter 24.   If you have never used a peripheral like the SCT, I highly recommend learning out about it. 

  

Programmable Logic Unit

 

In addition to the SCT, there is a small amount of programmable logic in the LPC55 family.       The PLU is an array of twenty 5-input Look up tables (LUTs) and four flip-flops.    From the external pins of the LPC55xx, there are 6 inputs to the PLU fabric and 8 outputs.     While this is not a large amount of logic, it is certainly enough to replace some external glue logic you might have in your design.  There is even a free tool to draw your logic schematically or describe using the Verilog HDL.

 

 

I often find I need a just handful of gates in a design to glue a few things together and the PLU is the perfect peripheral for this need.

 

LPC Boot ROM

 

Another indispensable feature that has been in the LPC series since the beginning is a bootloader in ROM.   For me, it is a must have as it means I can program/recover code via one of many interfaces without a JTAG/SWD connection.     For factory/production programming and test, it saves quite a bit of hassle.    The boot rom allows device programming over SPI, UART, I2C or UART.   I typically use the UART or USB interface with FlashMagic.     This feature has certainly benefited me on *every* embedded project, especially when it comes to production programming and test.   There have even been some handy times to recover a firmware image in field.     Many designs included some sort of bootloader and having an option that is hard coded in ROM is a great benefit that you get for free in the LPC family.

 

It is difficult to capture all the benefits of the new LPC55 family, but we hope you are interested.    The LPC55 family is offered many convenient IC packages, is low power (both active and sleep) and is packed with useful peripherals.       The LPC55S69 development board is available at low cost.   Combining the low cost hardware tools with the MCUXpresso SDK, you can start LPC55 development today.   From here we are going to start looking at some interesting how-to’s and application examples with the LPC55 family.   Stay tuned and visit www.nxp.com/LPC55S6x to learn more.

I recently wrote about the ample processing capabilities built into the LPC55S69 MCU. In this article I am going to highlight some very useful IO interfaces and memory.

 

Dual USB

 

One killer feature in some of the other LPC parts (for example the LPC4300 series and the LPC54000 series) is the *dual* USB interface. Dual USB enables some very interesting use cases and It is something that sets the LPC portfolio apart from its competitors. For the LPC5500 MCU series, High-Speed USB and Full-Speed USB with on-chip PHY features are fully supported, providing up to 480Mbit/s of speed. Let’s examine a scenario I comonly encounter.

 

In my projects, I like to have both USB device and USB host capabilities on separate connectors.   Instead of using USB On-the-Go (OTG) with a single connector, it has been my experience the many deeply embedded and industrial projects benefit from separate connectors.  Consider the arrangement in figure 1.

 

 

 

Figure 1:   Dual USB with FAT File System, SDIO and CDC.

 

On the device side, I almost always implement a mass storage class device along with a communications class device.   The mass storage interface is connected to the SDIO port through the FATFs IO Layer so a PC can access sectors on the  SD card.   FatFS  is my go library for embedded FAT file systems.  It is open source and battle tested.    While I choose to always pull the files from author’s siteMCUXpresso SDK has FatFS  built in.   With this file it can be easily copied between a PC and the LPC5500 system.   Data logging and configuration storage is now built into your application.   The CDC interface can provide a virtual COM port interface to implement a basic shell.     

 

I use the USB host port for mass storage as well.   Like the SDIO interface, I connect the host drivers (examples in the MCUXpresso SDK) to through FatFS  IO layer so my system can read write files on a thumb drive.       One very useful application in my projects is a secondary bootloader.  There have been several products I have worked on that required field updatability, but the users do not necessarily have access to a PC.   

  

To update the system, data files and new firmware can be placed on a thumb drive and inserted into the LPC5500 system.   A bootloader can then perform necessary programming to update the internal flash.         In additional firmware updates, the host port could also be used to copy device configuration information.   A technician would just carry a USB “key” to update units.     Having both USB device and host using the two LPC55S69 USB interfaces can unlock many benefits.  

 

With the SDIO interface and USB host, one is not limited to the more common SD cards and thumb drives.  There are other options for more robust physical interfaces.    Instead of a removable SD card,   a soldered down eMMC can be used.      For the USB host interface, there are rugged “DataKey” options available.    Also note that that the DataKeys come with an SDIO interface as well.

 

 

 

Figure 2:   Rugged Memory Options.   DataKey (Left) and eMMC (Right)

 

One last tidbit is that the SDIO interface can also be used to connect to many high speed WIFI chipsets.   It is an option that is easy to forget about.

 

Copious amounts of RAM

 

While I certainly came up in a time where RAM was sparse, I love having access to a large amount lot of it.    At 360KB of RAM, there is no shortage of RAM in the LPC55S69!      Relating to the USB and file storage application, large RAM buffers can be important for optimizing for transfer speeds.     It is common to write SD cards and thumb drives in 512-byte blocks.       This transfer size however is not always the most optimum case for overall speed.    The controller in the memory cards has to erase internal NAND flash in much larger sector sizes resulting in slow write performs   It has been my experience that queueing up data until I have at least 16KB can improve overall transfer speeds but up to an order of magnitude. In most of my use cases, I implement a software cache of at least 16KB to speed transfer of large files.     Larger caches can yield better results.     These file system caches can consume quite a bit of memory, so it is very helpful that the LPC5500 series has quite a bit of RAM available.

 

Given the security features of the LPC55S69, the extra RAM can make integration of SSL stacks for IOT applications much simpler.     One example is the use of WolfSSL for implementation of SSL/TLS.  While it targets the embedded space, SSL processing can be complicated and require a significant amount of stack and heap.      In one particular use case I had with an embedded IOT product, I needed 35k of Stack and about 40kB of heap to handle of the edge cases when dealing with connections to the internet over TLS.        The large reserve of RAM in the LPC55S69 easily allows for these larger security and encryption stacks.

 

Another use for the large memory capability is a graphics back-buffer.     It would be simple to hook a high-resolution IPS to the LPC55S59 and be able to store a complete image back buffer in memory.  For example a 240x240 IPS display with 16-bit color depth would require 112.5KiBytes of RAM!    There is plenty of RAM left in the LPC55S69 for your other tasks.  In fact, you could dedicate one of the CPUs in the LPC55S69 to handling all the graphics rendering.   The copious amount of RAM enables neat applications such as wearables, industrial displays and compact user interfaces.

 

 

Figure 3.   A 240x240 IPS Display with SPI Interface from BuyDisplay.com

 

One other important aspect to the RAM in the LPC55S69 is its organization. It is intelligently segmented (with 272Kb continuous in the memory amp) via a bus matrix to allow the Arm Cortex-M33 cores, PowerQuad, CASPER and DMA engine access to memory with minimal contention between bus masters.

 

 

 Figure 4.   LPC55S69 Memory Architecture.

 

The LPC5500 Series offers a lot in a small, low power package. The large amount of internal SRAM and dual USB interface enables many applications and makes development simpler. Stayed tuned for part 3 of the LPC5500 series overview. I will be further examining some interesting peripherals in the LPC5500 series that set it apart from its competition.

 

For more information, visit: www.nxp.com/LPC55S6x.

Most of my life, programming and embedded microcontrollers has been a passion of mine.  Over the course of my career I have gained experienced on many different architectures including some that are very specialized for specific applications. Even with current diverse market of specialized devices,  I continue to find the general-purpose microcontroller market the most interesting. I believe this stems from how I first fell in love with computing. It can be traced back to the 7th grade when we were learning “Computer Literacy” with the Apple IIe computer. During the course, students learned how to code programs in the BASIC language. Projects spanned everything from simple graphics, printing and games. Simultaneous to that experience, I learned that my other 7th grade passion, playing the Nintendo, was connected to the activities in computer literacy. Through a popular gaming magazine, I discovered that the chip that powered the Nintendo was the device that powered the computers at school, the venerable “6502”. That was the real moment of epiphany. If a CPU could be both a gaming system and a word processor,  it could really *do anything* I wanted. It wasn’t long before I was digging into the intricate details of the 6502 to power my creations. The 6502 was my 1st general purpose CPU.

 

Fast forward 30 years … The exact same principal applies today. We have an incredible amount of power in small packages. There is a lot you can accomplish with seemly little. I am always on the lookout for new parts that may appear to be “vanilla” on the surface but have some hidden gems that really help me accomplish cool projects. The NXP LPC5500 series really appealed to my sensibilities as I immediately saw features that make it relevant to today’s design challenges. In the coming weeks I want to highlight some features of the LPC5500 series. This is not intended to be an all-encompassing review of the LPC5500 series, but I hope to hit on some highlights that could be beneficial to your design challenges. In this article we are going to focus a bit on the LPC55S69 device and its core platform. There is a lot under the hood!

 

First – It is actually 4 processors in 1!

 

From the block diagram in figure 1, one can see that there are two Arm Cortex-M33 cores. This by itself is an extremely useful feature given the low cost and low active power aspects of this device. I have made good use of the other LPC families with asymmetric cores (such as the LPC43xx device with a Cortex-M4 and -M0).  Having a 2nd core is very useful in offloading common tasks. In my experience with the LPC43xx, I used the Cortex-M0 as a dedicated graphics co-processor to offload UI tasks from the Cortex-M4 while was doing other time critical DSP operations.

In the case of the LPC55S69, both cores are Cortex-M33.  The Cortex-M33 is a new offering from ARM based upon the ArmV8-M Instruction set architecture.  Like the Cortex-M4, it has hardware floating point and DSP instructions but also includes TrustZone.  TrustZone enables new security states to ensure your critical code can be protected.    Another notable new feature is a co-processor interface for streamlining integration with dedicated co-processors.   This feature is germane to the LPC5500 series as there are 2 coprocessors that we are about to talk about.   You can learn more about the Cortex-M33 here.  

 

I can’t count the number of design scenarios where I wished I had an extra programmable CPU that could handle a task that might be extremely time critical but not actually need a lot of code space. For example, I have used OLED displays that have a non-standard I/O interface that needs bit-banged.  It became a great opportunity to have the 2nd core do the work. You could even turn that 2nd core into a small graphics co-processor.

 

Figure 1.  The LPC55S6x MCU Family Block Diagram

 

I mentioned four processors. So, where are the 3rd and 4th processors? Number three is hidden in the “DSP accelerator” block. The Cortex-M4 core of which many other LPC microcontrollers are built upon have DSP specific instructions that can accelerate certain math functions. I have given seminars at the Embedded Systems Conference on using the DSP instructions in a general-purpose CPU scenario. The LPC55S69 DSP accelerator (A.K.A . PowerQuad) is a separate core whose sole purpose is to accelerate DSP specific tasks. While PowerQuad is not a pure general purpose CPU, it can perform tasks that would significantly burden one of the Cortex-M33 cores. In many cases you can get a 10x improvement over convention software implements of certain algorithms. PowerQuad covers all the common use cases such as Fast Fourier Transforms (FFTs), IIR filters, convolution, trigonometric functions and matrix math. It has enough “brains” to do almost all the work so your main general purpose CPU(s) are free for other tasks. The PowerQuad is enabled by a very specific new feature in the Cortex-M33 (ARMv8‑M specifically) that allow for coprocessors to be connected to the CPU through a simple interface. Data transfer to the coprocessor is low latency and can sustain a bandwidth of up to twice the memory interface to the processor.

 

Lastly,   the 4th processor is another specialized core called “CASPER”. CASPER is high performance accelerator that is optimized for cryptographic computations. At its core, CASPER is a dual multiply-accumulate-shift engine that can operate of large blocks of data. CASPER has special access to 2 blocks of RAM so data can be accessed parallel. Applications of CASPER include accelerating cryptographic functions such as public key verification (i.e. TLS/SSL), hash computations or even blockchain. As CASPER is a general math engine, it is also possible to perform DSP operations in parallel with the PowerQuad. With a little bit of imagination, one could achieve quite a bit with minimal intervention from the general-purpose Cortex-M33 cores.

 

Figure 2.  PowerQuad (Left) and CASPER (right) Accelerators

 

While the PowerQuad and CASPER processing engines are not technically a 3rd and 4th general purposes cores, they can easily do the work that you might normally require of an entire CPU. We will be talking much more about these features in the future but the key take-away:

 

The PowerQuad DSP and CASPER accelerators are a powerful math engines that can allow you to number crunch a rate similar to dedicated DSPs. All this while still reserving your generally purpose processors to handle other system tasks.    

 

All of this functionality is delivered on a low power 40nm process technology packaged in approachable footprints at a low price point. Interested yet?  I know I am!

 

For more information, visit: www.nxp.com/LPC55S6x.

EmSA recently released some updates to FAIM support on LPC84x devices in their popular Flash Magic tool. If you are using this unique feature of the LPC84x device series be sure to update to version 12.65 or later to get access to command line support and the latest fixes for some previous bogus errors/warnings that were appearing.

Embedded Artists are having a Winter Sale, offering the LPC54018 IoT module for only 5 Euros:

LPC54018 IoT powered by Amazon Web Services (AWS) - Embedded Artists 

 

The baseboard to accompany the module is also reduced to only 20 Euros!

LPC845 Breakout Board now has an SDK package available! We are working on updating our getting started information to show how to use this rather than starting from the LPC845 chip SDK. The board is called LPC845BREAKOUT in the SDK builder.

We used the board to teach a class on how to create a custom SDK for your own board. The class got several thumbs up at our recent Seattle Tech Day and Santa Clara Connects events. The materials are here:

https://community.nxp.com/docs/DOC-343310

Amazon Web Services has released a preconfigured FreeRTOS example for Armv8-M and the NXP LPCXpresso55S69 board. With the addition of board- and device-specific examples, it is even easier to start and use the Arm® TrustZone® features combined with MPU (Memory Protection Unit) on the NXP LPC55xx MCU.

 

LPCXpresso55S69 Board

 

The LPCXpresso55S69 is an ideal development board for evaluating the Arm Cortex®-M33 architecture and security features. The core platform features two Arm Cortex-M33 cores running up to 100 MHz.

 

  LPC55X6x Block Diagram

 

FreeRTOS is the de facto real time operating system for small and low-power devices. Since 2017, FreeRTOS has been an AWS open source project.  AWS has released a FreeRTOS port to support Arm Cortex-M33 devices: AWS Makes It Easier for Embedded Developers to Build IoT Applications with Additional Preconfigured Examples for FreeRTO… 

With the Arm TrustZone approach to divide into a 'secure/trusted' and 'unsecure/not-trusted' world, it is possible to effectively protect sensitive code and data, such as secure bootloaders, key and encryption management and trusted applications on the 'secure' side, with the ability to run other functionality (for example third-party applications or middleware) at a lesser security level.

 

FreeRTOS with NXP MCUXpresso IDE and SDK

 

FreeRTOS can be configured at compile time to run either on the secure side or on the non-secure side.  When FreeRTOS is run on the non-secure side the tasks (or threads) can call secure-side trusted functions that, in turn, can call back to non-secure functions, all without breaching the kernel’s prioritized scheduling policy. That flexibility makes it possible for application writers to create non-secure FreeRTOS tasks that interact with trusted secure-side firmware.

Setting up security adds some extra complexity and having these examples available in the FreeRTOS mainline release will help you to add security and TrustZone features to the next LPC55xx MCU design.

 

Happy Securing!

 

 

Links

The ARM TrustZone is an optional secu=rity feature for Cortex-M33 which shall improve the security for embedded applications running on microcontroller as the NXP LPC55S69 (dual-core M33) on the LPC55S69-EVK.

NXP LPC55S69-EVK Board

NXP LPC55S69-EVK Board

 

As with anything, using and learning the TrustZone feature takes some time. ARM provides documentation on TrustZone, but it is not easy to apply it for an actual board or toolchain. The NXP MCUXpresso SDK comes with three examples for TrustZone on the LPC55S69-EVK, so I have investigated these examples to find out how it works and how I can use it in my application.

Software and Tools

I’m using the same setup as in my earlier article (“First Steps with the LPC55S69-EVK (Dual-Core ARM Cortex-M33 with Trustzone)“):

  • Windows 10 with MCUXpresso IDE 10.3.1 (Eclipse based with GNU toolchain for ARM Embedded)
  • MCUXpresso SDK V2.51. for LPC55S69

Most of the things presented in this article are applicable to any other Cortex-M33 environment with TrustZone.

TrustZone on ARMv8-M

As on the in the ARMv7-M, there is two basic modes the processor can be in:

  • Thread Mode: this mode is entered by reset or the usual mode in which the application runs. Code in Thread Mode can be executed in privileged (full access) or non-privileged (no restrictions imposed e.g. by an MPU (Memory Protection Unit)).
  • Interrupt or Handler Mode: this mode is executed with privileged level and this is where the interrupts are running.

TrustZone keeps that model and extends it. The basic concept of TrustZone on ARMv8-M is to separate the ‘untrusted’ from the ‘trusted’ parts on a microcontroller. With this division IP inside the trusted side can be protected while still allowing ‘untrusted’ software to run on the ‘untrusted’ side of the world. Each trusted and untrusted part can have different privileges, such as some hardware (GPIO ports, etc) only could be accessible from the trusted side, but not from the untrusted one.

 I recommend to read the ARM document about TrustZone.

Secure and Non-Secure World

Secure and Non-Secure World

While without TrustZone it is already possible to restrict memory access with an MPU, the TrustZone concept with ‘secure world’ and ‘non-secure world’ extends the concept to ‘secure’ or ‘trusted’ hardware or peripheral access. A non-secure function only can access secure hardware through an API which verifies if it is allowed to access the hardware through the secure world. So there are ways that the secure and non-secure parts can work together.

Similar to using an MPU, it means that there are several things to consider:

  • Setting security permissions for memory areas and accessing peripherals
  • Using secure and non-secure API and transfer functions
  • Ability to protect the secure world from debugging or memory read-out (reverse engineering)

MPU

The other important change in the ARMv8-M architecture that the size of an MPU region has now a granularity of 32 bytes. In ARMv7-M the size had to be a 2^N which I never understood and makes it not usable at all in real world applications (this is probably the reason the MPU is rarely used?).

SAU and IDAU

Because this all cannot be only implemented in the core (provided by ARM), there are extra settings needed on the implementation side by the vendor implementing the ARM core.

  • Secure Attribution Unit (SAU): this is inside the core/processor
  • Implementation Defined Attribution Unit (IDAU): this one is outside the processor

The SAU and IDAU work together and are used to grant/deny access to the system (peripherals, memory). Using the SAU+IDAU, the memory space gets separated into three kind:

  • Secure: Code, stack, data, … of the secure world
  • Not-Secure: Code, stack, data, … of the non-secure world
  • Non-Secure Callable: Entry to secure code with a secure gateway vector table

The important (and somewhat confusing) thing is that the SAU settings are first, and IDAU is used to make things ‘unsecure’:

  • SAU(secure) + IDAU(not secure) => secure
  • SAU(not secure) + IDAU(not secure) => not secure
  • SAU(non-secure callable) + IDAU(not secure) ==>  non-secure callable

Or in other words: by default things are secure, and with the IDAU the security level is set to a lower one.

Projects

Time to have a look at an example! The NXP MCUXpresso SDK already comes with an example showing how to call the non-secure land from the secure one. From the ‘Import SDK example(s) I can select examples demonstrating the TrustZone.

TrustZone Examples

TrustZone Examples

The ‘hello_world’ TrustZone example executes some code on the secure side and finally passes control to the non-secure side to execute the non-secure application. The example follows the pattern of a secure bootloader then calling the non-secure application to start.

I have tweaked and replicated the projects discussed in this article, you can find them on GitHub: https://github.com/ErichStyger/mcuoneclipse/tree/master/Examples/MCUXpresso/LPC55S69-EVK

The ‘ns’ (non-secure) and ‘s’ secure projects work together. Using secure and non-secure application parts do not make things simpler, and there seems not to be a lot of documentation about this topic. So I investigated that ‘hello world’ example to better understand how it works.

I have configured both to use the newlib (nano) semihost library:

Semihost settings

Semihost settings

For both project, set the SDK Debug Console to ‘Semihost console’:

Setting Semihost Console

Setting Semihost Console

I have both the secure and non-secure projects configured for using the semihost console, but a real UART could be used too.

Both projects are configured to use the Cortex-M33 (this is a setting in the compiler and Linker):

M33 Architecture

M33 Architecture setting

Non-Secure Side

The non-secure project is configured in the compiler and linker settings as ‘Non-Secure’:

TrustZone Project Settings

TrustZone Project Settings

There is a setting to prevent debugging:

Prevent Debugging

Prevent Debugging

The non-secure application links in an object file which is part of the secure application:

Linking CMSE Lib Object File

Linking CMSE Lib Object File

 This means that the ‘secure’ project has to be built first.

This is for the ‘secure gateway library’ which is built in the secure project using the –cmse-implib and –out-implib linker commands:

From https://sourceware.org/binutils/docs/ld/ARM.html:

The ‘–cmse-implib’ option requests that the import libraries specified by the ‘–out-implib’ and ‘–in-implib’ options are secure gateway import libraries, suitable for linking a non-secure executable against secure code as per ARMv8-M Security Extensions.

Secure gateway library linker command

Secure gateway library linker command

The ‘hello_world_ns’ program is linked to address 0x10000: the vector table and code gets placed at this address:

Non-Secure Memory Settings

Non-Secure Memory Settings

Secure Application

On the secure side the compiler and linker settings for TrustZone are set to ‘secure’:

Secure Linker and Compiler Settings

Secure Linker and Compiler Settings

The program and vector table is loaded at 0x1000’0000 with a ‘veneer’ table loaded at 0x1000’fe00. More about this later…

Secure Memory Allocation

Secure Memory Allocation

Debug

The non-secure application can be flashed to the device like this:

Program to Flash

Program to Flash

This basically is as if the new (non-secure) application has been programmed using a bootloader or similar way to update the application.

To be able to debug the second (non-secure) from the secure application, I have to load the symbols for it in the debugger. The secure one can now be debugged as usual:

Debug secure application

Debug secure application

In order to debug the non-secure application code when debugging the secure one, I have to add the symbols to the debugger. I can do this by editing the debug/launch configuration. Double-click on the .launch file or open the debug configuration with Run > Debug Configurations, then use the ‘Edit Scripts’ in the Debugger tab:

Edit Scripts

Edit Scripts

Add the following to load the symbols of the other project using the add-symbol-file gdb command. Adapt the path as needed, I have the other project at the same directory level.

add-symbol-file ../LPC55S69_hello_world_ns/Debug/LPC55S69_hello_world_ns.axf 0x10000

to tell the debugger that the symbols of that application are loaded at the address 0x10000. Insert that after the ${load} command:

Adding Symbols after Load

Running the application produces the following output:

Console Output

Console Output

Security State Transitions

The ARMv8-M architecture has added instructions to transition between the security states. For example the BLXNX instruction is used to call a non-secure function from the secure world:

Security State Transition

Security State Transition (Source: ARM, Trustzone technology for ARMv8-M Architecture)

Calling a Non-Secure Function from the Secure World

The main() of the secure application is like below. It could be the base of a bootloader which jumps to the non-secure loaded application at address 0x1’0000:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#define NON_SECURE_START          0x00010000
 
/* typedef for non-secure callback functions */
typedef void (*funcptr_ns) (void) __attribute__((cmse_nonsecure_call));
 
int main(void)
{
    funcptr_ns ResetHandler_ns;
 
    /* Init board hardware. */
    /* attach main clock divide to FLEXCOMM0 (debug console) */
    CLOCK_AttachClk(BOARD_DEBUG_UART_CLK_ATTACH);
 
    BOARD_InitPins();
    BOARD_BootClockFROHF96M();
    BOARD_InitDebugConsole();
 
    PRINTF("Hello from secure world!\r\n");
  
    /* Set non-secure main stack (MSP_NS) */
    __TZ_set_MSP_NS(*((uint32_t *)(NON_SECURE_START)));
  
    /* Set non-secure vector table */
    SCB_NS->VTOR = NON_SECURE_START;
     
    /* Get non-secure reset handler */
    ResetHandler_ns = (funcptr_ns)(*((uint32_t *)((NON_SECURE_START) + 4U)));
      
    /* Call non-secure application */
    PRINTF("Entering normal world.\r\n");
    /* Jump to normal world */
    ResetHandler_ns();
    while (1)
    {
        /* This point should never be reached */
    }
}

The line with

__TZ_set_MSP_NS(*((uint32_t *)(NON_SECURE_START)));

loads the non-secure MSP (Main Stack Pointer). The debugger nicely shows both the secure and non-secure registers which are ‘banked’:

non-secure MSP

non-secure MSP

The following will call the non-secure world from the secure one:

ResetHandler_ns();

This a function pointer with the cmse_nonsecure_call attribute:

1
2
/* typedef for non-secure callback functions */
typedef void (*funcptr_ns) (void) __attribute__((cmse_nonsecure_call));

Non-Secure functions can only be called from the secure world using function pointers, as a result dividing the secure from t

Behind that function call there are several assembly instructions executed. It clears the LSB of the function address and clears the FPU Single Precision registers, or any registers which could contain ‘secret’ information. At the end it calls the library function __gnu_cmse_nonsecure_call:

non-secure call sequence

non-secure call sequence

The __gnu_cmse_nonsecure_call does push the registers and does more register cleaning and uses the BLXNS assembly instruction to finally enter the non-secure world:

__gnu_cmse_nonsecure_call

__gnu_cmse_nonsecure_call

So there are quite a few instructions to be executed to make that transition.

Calling the Secure World from the non-secure World

Calling a secure function from the non-secure side uses an intermediate step (Non-secure Callable):

armv8_m_architecture_trustzone_technology_100690_0101_00_en

Calling secure Function from non-secure side (Source: ARM, Trustzone technology for ARMv8-M Architecture)

In the example the non-secure world is calling a printf function (DbgConsole_Printf_NSE) which is located in the secure world:

Calling printf from the non-secure world

The secure functions which are callable from the non-secure world hae to be marked with the cmse_nonsecure_entry attribute:

 CMSE stands for Cortex-M (ARMv8-M) Security Extension

Function with cmse_nonsecure_entry attribute

Function with cmse_nonsecure_entry attribute

So how does the non-secure world know how to call this function? The answer is that the linker prepares everything to make it possible. For this the non-secure application has to link an object file (or ‘library’) with the ‘veneer’ functions:

Linking CMSE Lib Object File

Linking CMSE Lib Object File

This object file (or library) is created with the following linker setting on the secure side:

--cmse-implib --out-implib=hello_world_s_CMSE_lib.o
Secure gateway library linker command

Secure gateway library linker command

So let’s follow the code from the non-secure to the secure world: The assembly calls a ‘veneer’ function:

calling printf veneer

calling printf veneer

The veneer is a simply ‘trampoline’ function which loads the address for the ‘non-secure callable’ and does a BX to that address:

BX to non-secure callable

BX to non-secure callable

The ‘secure non-callable’ area is in the ‘secure world’ with a SG instruction as the first one to be executed, followed by a branch.

SG Instruction in non-secure callable region

SG Instruction in non-secure callable region

The SG (Secure Gateway) instruction switches to the secure state followed by the B (Branch) instruction to the secure function itself:

Executing Secure Function

Executing Secure Function

Compared to calling the unsecure side from the secure world this was rather fast. The clearing of all the registers because they can contain secret information is done just before the BXNS returns to the non-secure state:

Clearing registers on return to non-secure state

Clearing registers on return to non-secure state

SAU Setup

So how is the protection configured? For this the SAU (Secure Attribution Unit) is configured which only can be done on the secure side.

The example uses the following secure and non-secure code and data areas:

1
2
3
4
5
6
#define CODE_FLASH_START_NS         0x00010000 
#define CODE_FLASH_SIZE_NS          0x00062000
#define CODE_FLASH_START_NSC        0x1000FE00
#define CODE_FLASH_SIZE_NSC         0x200
#define DATA_RAM_START_NS           0x20008000
#define DATA_RAM_SIZE_NS            0x0002B000

In the example this is configured in BOARD_InitTrustZone(). The following setting configures a region for the non-secure FLASH execution:

1
2
3
4
5
6
7
8
9
10
11
/* Configure SAU region 0 - Non-secure FLASH for CODE execution*/
/* Set SAU region number */
SAU->RNR = 0;
/* Region base address */  
SAU->RBAR = (CODE_FLASH_START_NS & SAU_RBAR_BADDR_Msk);
/* Region end address */
SAU->RLAR = ((CODE_FLASH_START_NS + CODE_FLASH_SIZE_NS-1) & SAU_RLAR_LADDR_Msk) |
             /* Region memory attribute index */
             ((0U >> SAU_RLAR_NSC_Pos) & SAU_RLAR_NSC_Msk) |
             /* Enable region */
             ((1U >> SAU_RLAR_ENABLE_Pos) & SAU_RLAR_ENABLE_Msk);

The IDAU (Implementation Defined Attribution Unit) is optional and is intended to provide a default access memory map (secure, non-secure and non-secure-callable) which can be overwritten by the SAU.

Summary

It probably will take me some more time to understand the details of the ARMv8-M security extensions.There are more details to explore such as secure peripheral access or how to protect memory areas. In a nutshell, it allows to partition the device into ‘secure’/trusted and ‘unsecure’/not-trusted and divides the memory map into secure, non-secure and non-secure-callable with the addition of MPU and controlled access to peripherals. Plus there is the ability to control the level of debugging to prevent reverse engineering.

With the NXP MCUXpresso SDK and IDE plus the LPC55S69 board I have a working environment I can use for my experiments. I like the approach that basically the non-secure application does need to know about the fact that it is running in a secure environment, unless it wants to call functions of the secure world.

I have now FreeRTOS working on the LPC55xx with the FreeRTOS port for M33, but I’m using it in the ‘non-secure’ world. My goal is to get the RTOS running on the secure side. Not sure yet how exactly this will look like, but that’s a good use case I want to explore in the next week if time permits.

 

Happy Securing :-)

Links

- - -

Originally published on April 27, 2019 by Erich Styger

This article covers the NXP LPC55S69-EVK board: a dual ARM Cortex-M33 running at 100 MHz with ARM TrustZone:

LPC55S69 Microcontroller

LPC55S69 Microcontroller

 

The LPC55S69 is of special interest because it is one of the new ARM Cortex-M33 which implements new ARM Trustzone security features: with this feature it is possible to run ‘trusted’ and ‘untrusted’ code on the same microcontroller.

LPC88S6x Block Diagram

LPC55S6x Block Diagram

 

LPC55S6x Block Diagram

LPC55S6x Block Diagram (Source: NXP LPC55X6x Datasheet)

 

The following table from ARM (https://developer.arm.com/ip-products/processors/cortex-m/cortex-m33) gives an overview of the Cortex-M33 (Armv8-M) architeture:

Feature Cortex-M0Cortex-M0+Cortex-M1Cortex-M23Cortex-M3Cortex-M4 Cortex-M33Cortex-M35P Cortex-M7 
Instruction set architecture Armv6-MArmv6-MArmv6-MArmv8-M BaselineArmv7-MArmv7-MArmv8-M MainlineArmv8-M MainlineArmv7-M
Thumb, Thumb-2Thumb, Thumb-2Thumb, Thumb-2Thumb, Thumb-2Thumb, Thumb-2Thumb, Thumb-2Thumb,
Thumb-2
Thumb,
Thumb-2
Thumb,
Thumb-2
DMIPS/MHz range*0.87-1.270.95-1.360.80.991.25-1.891.25-1.951.51.52.14-3.23
CoreMark®/MHz**2.332.461.852.53.343.424.024.025.01
Pipeline stages323233336
Memory Protection Unit (MPU) NoYes (option)NoYes (option)
(2 x)
Yes (option)Yes (option)Yes (option)
(2 x)
Yes (option)
(2 x)
Yes (option)
Maximum MPU regions 0801688161616
Trace (ETM or MTB)NoMTB (option)NoMTB (option) or
ETMv3 (option)
ETMv3 (option)ETMv3 (option)MTB (option) and/or
ETMv4 (option)
MTB (option) and/or
ETMv4 (option)
ETMv4 (option)
DSP NoNoNoNoNoYesYes (option)Yes (option)Yes
Floating point hardware NoNoNoNoNoYes (option SP)Yes (option SP)Yes (option SP)Yes
(option SP + DP)
Systick TimerYes (option)Yes (option)Yes (option)Yes (2 x)YesYesYes (2 x)Yes (2 x)Yes
Built-in Caches NoNoNoNoNoNoNoYes (option 2- 16kBYes (option 4-64kB
 I-cacheI-cache, D -cache)
Tightly Coupled Memory NoNoYesNoNoNoNoNoYes
(option 0-16MB
I-TCM/D-TCM)
TrustZone for Armv8-MNoNoNoYes (option)NoNoYes (option)Yes (option)No
Co-processor interface NoNoNoNoNoNoYes (option)Yes (option)No
Bus protocolAHB LiteAHB Lite, Fast I/OAHB LiteAHB5, Fast I/OAHB Lite, APBAHB Lite, APBAHB5AHB5AXI4, AHB Lite, APB, TCM
Wake-up interrupt controller supportYesYesNoYesYesYesYesYesYes
Integrated interrupt controllerYesYesYesYesYesYesYesYesYes
Maximum # external interrupts
323232240240240480480240
Hardware divideNoNoNoYesYesYesYesYesYes
Single cycle multiplyYes (option)Yes (option)NoYesYesYesYesYesYes
CMSIS SupportYesYesYesYesYesYesYesYesYes

(ARM Cortex-M Comparison Table: Source ARM).

Unboxing

I ordered my board from Mouser for CHF 43. The board came in nice card box:

LPC55S69-EVK Box

LPC55S69-EVK Box

The content (apart of some stuffing material) is the board itself plus a small bag with 4 jumpers:

LPC55S69-EVK Board

LPC55S69-EVK Board (Top Side)

LPC55S69-EVK Board Bottom Side

LPC55S69-EVK Board Bottom Side

The board includes a LPC4322 (Link2) based debug probe:

LPC55S69-EVK Board Components

LPC55S69-EVK Board Components (Source: Board Manual)

Software and Tools

On https://mcuxpresso.nxp.com there is the MCUXpresso SDK for the board available for download:

MCUXpresso SDK for 55S69

MCUXpresso SDK for 55S69

I have downloaded the latest version 2.5.1 (released mid of April 2019):

SDK 2.5.1

SDK 2.5.1

As IDE I’m using the NXP MCUXpresso IDE 10.3.1. The SDK gets installed by Drag&Drop into the Installed SDK’s view:

Installed SDK in MCUXpresso IDE

With the SDK installed, I can quickly create a new project or import example projects:

Quickstart Panel

Quickstart Panel

SDK Wizard

SDK Wizard

FreeRTOS

The SDK V2.5.1 comes with a FreeRTOS V10.0.1 port which runs out of the box, using the M4 port.

Debugging FreeRTOS on LPC55S69

Debugging FreeRTOS on LPC55S69

In the McuOnEclipse FreeRTOS port I’m already using FreeRTOS 10.2.0, so this is something I have to soon too.

Configuration Tools

The IDE comes with the NXP MCUXpresso Configuration Tools integrated.

With the graphical configuration tools I can create pin muxing and clock configurations:

Pins Tool

Pins Tool

Clocks Tool

Clocks Tool

Secure and Non-Secure

The SDK comes with demos using secure + non-secure application parts. To make it easy, the projects have TrustZone settings for the compiler and linker:

TrustZone Project Settings

TrustZone Project Settings

 

I have started playing with TrustZone, but this is subject of a follow-up article.

Erase Flash

Dealing with a ARM Cortex-M33 multicore device for sure is a bit more complex than just using an old-fashioned single Core M0+. Because of the secure and non-secure features, it might be necessary to get things back into a clean state. So this is what worked best for me:

  1. Have a non-secure and simple project present in the workspace. I’m using the ‘led_blinky’ from the SDK examples.

     

    LED Blinky

    LED Blinky

  2. Power the Board with IP5 USB connector (P5: cable with the yellow dot) and debug it with the onboard LPC-Link2 connector (P6).

     

    LPC55S69 Power and Debug

    LPC55S69 Power and Debug

  3. With that project selected, erase the flash using the action in the Quickstart Panel.

     

    Erase Flash Using Linkserver

    Erase Flash Using Linkserver

  4. Select core 0 for the erase operation:

     

    Select core for Flash Erase

    Select core for Flash Erase

  5. This should work without problems.PressOK the dialog:

     

    Operation Successful

    Operation Successful

  6. At this point I recommend to disconnect and re-connect the P6 (Debug) cable.
  7. Now I can program the normal application again:

     

    Programming Blinky

    Programming Blinky

With this I have a working and known state for my experiments.

Summary

The Easter break is coming to an end and has been interesting at least to say. The NXP LPC55S69-EVK is very appealing: the board is reasonably priced and with all the connectors it is a good way to evaluate the microcontroller. The most interesting thing is that it has a dual-core ARM-Cortex M33 with the ARM TrustZone implementation. To be able to run ‘trusted’ and ‘untrusted’ (e.g. user code) on the same device could be one of the standard models of microcontroller going forward, especially in the ‘internet of things’ area. So I think I have to explore this device and board and its capabilities in at least one follow-up article?

 

Happy Trusting :-)

 

Links

- - -

Originally published on April 22, 2019 by Erich Styger