In some of my past articles on the PowerQuad, we examined some common signal processing operations (IIR BiQuad and the Fast Fourier Transform) and then showed how to use the PowerQuad DSP engine to accelerate the computations. The matrix engine in the PowerQuad can be used to perform common matrix and vector operations to free the M33 CPU(s) to perform other tasks in parallel. In general, the matrix engine is limited to a maximum size of 16x16 size (or 256 operations).
Figure 1: PowerQuad Matrix Operation Maximum Sizes
A simple, but useful, operation that is common in a processing pipeline is the Hadamard (elementwise) product. Think of it as multiplying two signals together. Let us say we have two input vectors/signals that are 1x32 in size:
Figure 2. Hadamard Product
A quick note: because the Hadamard product only needs a signal element from each of the inputs to produce each element in the output, the actual shape of the matrix/vector is inconsequential. For example, a 16x16 matrix and 1x256 vector would yield the same result if the input data is organized the same in memory.
The cartoon in Figure 2 illustrates a common application of the Hadamard product: windowing of a time domain signal. In my last article, we looked at how Discrete Fourier Transforms are constructed from basic mathematical operations. There was one assumption I made about the nature of the signal we comparing to our cosine/sine references. Consider the cartoon in Figure 3.
Figure 3: The Rectangular Window as a “Default”.
Let us say we captured 32 samples of a signal via an Analog to Digital Converter (ADC). In the “real” world, that signal existed before and after the 32 point “window” of time. Here is philosophical question to consider:
Is there any difference between our 32 samples and an infinite long input multiplied by a “rectangular” window of 1’s around our region of interest?
In fact there is not! The simple act of “grabbing” 32 samples yields us a mathematical entity that is the “product” of some infinite long input signal and a 32-point rectangle of 1’s (with zero’s elsewhere) around our signal. When we consider operations such as the Discrete Fourier Transform, what we are transforming is the input signal multiplied by a window function. Using mathematical properties of the Fourier Transform, it can be shown that this multiplication in the time domain is a “shift” of the window’s Fourier Transform in the frequency domain. There is a lot of theory available to explain this effect, but the takeaway is the rectangular window exists by the simple act of grabbing a finite number of samples. One of my pet peeves is seeing literature that refers to the “windowed vs non-windowed transforms”. The Rush song “Free Will” has a memorable lyric:
“If you choose not to decide, you still have made a choice”
By doing nothing, we have selected a rectangular window (which shows up as a sin(x)/x artifact around the frequency bins). While we cannot eliminate the effects of the window, we do have some choice in how the window artifacts are shaped. By multiplying the input signal by a known shape, we can control artifacts in the frequency domain caused by the window. Figure 2 shows a common window called a “Hanning” window.
In the context of the LPC55S69 and the PowerQuad, the matrix engine can be used to apply a different “window” an input signal. Since applying a window before computing a Fast Fourier Transform is a common operation, consider using the Hadamard product in the PowerQuad to do the work.
Vector Dot Product
In my last article, I showed that the Discrete Fourier Transform is the dot product between a signal and a Cosine/Sine reference applied to many different frequency bins. I wanted point this out here as the PowerQuad matrix engine can compute a dot product. While the FFT is certainly the “workhorse” of Frequency domain processing, it is always not the best choice for some applications. There are use cases where you may only to need to perform frequency domain analysis at a single or (just a few) frequency bins. In this case, directly computing the transform via the dot product may be a better choice.
One constraint of the using an FFT is that the bins of the resultant spectrum are spaced as the sample rate over the number of samples. This means the bins may not align to frequencies important to your application. The only knobs you have is to adjust are the sample rate and the number of samples (which must be a power of two). There are cases where you may need to align your analysis to an exact number which may not have a convenient relationship to your sample rate. In this case, you could use the dot product operation using the exact frequencies of interest. I have worked on applications that required frequency bins that were logarithmically spaced. In these cases, directly computing the DFT was the best approach to achieve the results we needed.
The FFT certainly has computational advantages for many applications but it is NOT the only method for frequency domain analysis. Speed is not always the primary requirement for some application so don't automatically think you need an FFT to solve a problem. I wanted to point this out in the context of matrix processing as the PowerQuad could still be used in these scenarios to do the work and keeping the main CPU free for general purpose operations.
Also, I do want to mention that in these special cases there are alternate approaches besides the direct computation of the DFT with the dot product such as Goerztel’s method. Even in these cases, you can use features in the PowerQuad to compute the result. In the case of Goerztel’s method, the IIR BiQuad engine would be a great fit.
There are literally hundreds of applications where you need efficiently perform matrix multiplication, scaling, inversion, etc. Just keep in mind the PowerQuad can do this work efficiently if the matrix dimensions are of size 16x16 or smaller (9x9 in the case of inversion). One possible application that came to mind was Field Oriented Control (FOC). FOC applications use special matrix transformations to simplify analysis and transform motor currents into a direct/quadrature reference frame:
Another neat application would be to accelerate an embedded graphics application. I was thinking that the PowerQuad Matrix Engine could handle 2D and 3D coordinate transformations that could form the basis for a “mini” vector graphics and polygon rendering capability. When I got started with computing, video games drove my interest in “how thing work”. I remember the awe i felt when I 1st saw games that could rotated shapes on a screen. It "connected" when I found computer graphics text that showed this matrix equation:
Figure 5: 2D Vector Rotation Matrix.
This opened my mind to many other applications as the magic was now accessible to me. Maybe I am just dreaming a bit but having a hardware co-processor such as the PowerQuad can yield some interesting work!
Getting Started with PowerQuad Matrix Math
Built into the SDK for the LPC55S69 are plenty of PowerQuad examples. “powerquad_matrix” has plenty of examples that exercise the PowerQuad matrix engine.
Figure 6: PowerQuad Matrix Examples in the SDK.
Let us take a quick peek at the vector dot product example:
Figure 7: PowerQuad Vector Dot Product.
As you can see, there is actually very little required setup PowerQuad for a matrix/vector computation. There are handful of registers over the AHB bus that need configured and then the PowerQuad will do the work. I hope this article got you thinking of some neat applications with the LPC55S69 and the PowerQuad. Next time we are going to wrap up the PowerQuad articles with a neat application demonstration. After that we are going to look at some interesting graphics and IOT applications with the LPC55S69. Stay tuned!
In the meantime, here are all my previous LPC55 articles just in case you missed them.
The first step in designing a PCB with a new MCU is to add the part into your component libraries. Component library management can a source of passionate disagreements between design engineers. My own view on library management is rooted in many years of making mistakes! These simple mistakes ultimately caused delays and made projects more difficult than they needed to be. Often time these mistakes were also driven by a desire to "save time". Given my experience, there are a few overarching principles I adhere to.
The individual making the component should also be the one who has to stay the weekend and cut traces if a mistake is made. This obviously conflicts of the “librarian/drafter” model but I literally have seen projects where the librarian made a mistake on a 1000+ pin BGA that cost >$5k. This model was put in a library and marked as “verified”. The person making the parts needs some skin in the game! In this case, the drafting teams claimed they had a processing that included a double check but *no one in that process knew they context on how the part was going to be used*.
Pulling models from the internet or external libraries is OK as a starting point but it is just that, A starting point. You must treat every pin as if it was wrong and verify. Since many organizations have specific rules on how a part should look, you will need to massage the model to meet your own needs. Software engineers shake their head at this rule. "Why not build on somebody else's libraries? It is what we do!". Well, A mistake in a hardware library can take weeks if not months to really solve.... The cost, time and frustration impact can be huge. We hardware engineers can't simply "re-compile".
I don’t trust any footprint unless I know it has been used in a successful design. The context of how a part is used is very important (which leads to #4).
I believe the design re-used is best done at a schematic snippet level, not an individual part. After all, once I get this Mini-Monkey board complete, I will never again start with just the LPC55S69. I want all the “stuff” surrounding the chip that makes it work!
To the casual observer, these principles seems onerous and time consuming but I have found that the *save me time over the course of the project*. Making your own parts may seem time consuming but it *does not have to be*. There are tools that can make your life simpler and the task less arduous. Also making your own CAD part is useful for a few other reasons:
You have to go through a mental exercise when looking at each of the pins. It forces you brain to think about functionality in a slightly different way. When starting with a new part/family, repeated exposure is a very good way to learn.
Looking at the footprint early on gets your brain in a planning mode for when you do get started.
One could argue that this is “lost” time as compared to getting someone else to do the CAD library management it but I really feel strongly that it saves time in the long run. I have witnessed too many projects sink time into unnecessary debugging due to the bad CAD part creation. I feel the architect of the design needs to be intimately involved and take ownership of the process.
The LPC55S69 in the VFBGA package has only 98 pins. With no automation or tools, it would not take all that long build a part right from the datasheet. However, it is on the edge of being a time consuming endeavor. Also, when I build schematic symbols, I tend to label the pins with all possible IO capabilities allowed by the MCU pin mux. This can make the part quite large but it also helps see what also is available on a pin if I am in in a debug pinch. Creating pins with all this detail can be quite time consuming. I use Altium Designer for all of my PCB design and it has some useful automation to make parts more quickly. NXP’s MCUXpresso tool also has a unique feature that can really help board designers get work done quickly.
Creating the Pin List
Built into MCUXpressois a pins tool that is *very* useful in large projects with setting up the pin mux’s and doing some advanced planning. While it is primarily a tool for bootstrapping pin setup for the firmware, It can also use useful to drive the CAD part creation process. Simply create a new project and start the pins tool:
The pins tools gives you a tabular and physical view of pin assignments. Very useful when planning your PCB routing. We will use the export feature to get a list of all the pins, numbers and labels.
The pins tool generates a CSV file that you can bring into your favorite editor. Not only do I get the pin/ball numbers, I get all of the IO options available via the MCU pin mux.
Altium Designer requires a few extra columns of meta-data to be able import the data into a grouping of pins in the schematic library editor. At this point you could group the pins to your personal preference. I personally like to see all pin function of the schematic but does create rather large symbols. The good news here is that by using MCUXpresso and Altium you can make this a 10-minute job, not a 3 hour one. Imagine going through the reference manual line by line!
Viola! A complete symbol. It just took a few minutes of massaging to get what I wanted. Like I stated previously, a 98 pin package is not that bad to do manually but you can imagine a 200 or 300 pin part (such as the i.MX RT!)
The VFBGA package is 7mmx7mm with a 0.5mm pitch. There are balls removed from the grid for easier route escaping when use this part with lower cost fabrication processes.
Once again, with a quick look at NXP documentation and using the Altium IPC footprint generator, we can make quick work of getting an accurate footprint.
The IPC footprint generator steps you through the entire process. All you need is the reference drawing.
A quick note about the IPC footprint tool in this use case. The NXP VFBGA has quite a few balls removed to allow of easier escaping. The IPC footprint generator can automatically remove certain regions, I found that this particular arrangement needed a few minutes of hand work to delete the unneeded pads given the unique pattern.
By using Altium and NXP’s MCUXpresso tool together, I was about to get my CAD library work done very quickly. And because I spent some time with the design tools, I became more familiar with the IO’s and physical package. This really helps get the brain primed for the real design work.
At this point in the proces I have a head start on the schematic entry and PCB layout. Next time we are going to dive in a bit to see what connections we need to bootstrap the LPC55S69 to get it up and running. We will take a look at some of the core components to get the MCU to boot and some peripheral functions that will help the Mini-Monkey come alive!
EmSA recently released some updates to FAIM support on LPC84x devices in their popular Flash Magic tool. If you are using this unique feature of the LPC84x device series be sure to update to version 12.65 or later to get access to command line support and the latest fixes for some previous bogus errors/warnings that were appearing.
The LPC55S69 is of special interest because it is one of the new ARM Cortex-M33 which implements new ARM Trustzone security features: with this feature it is possible to run ‘trusted’ and ‘untrusted’ code on the same microcontroller.
With the SDK installed, I can quickly create a new project or import example projects:
The SDK V2.5.1 comes with a FreeRTOS V10.0.1 port which runs out of the box, using the M4 port.
Debugging FreeRTOS on LPC55S69
In the McuOnEclipse FreeRTOS port I’m already using FreeRTOS 10.2.0, so this is something I have to soon too.
The IDE comes with the NXP MCUXpresso Configuration Tools integrated.
With the graphical configuration tools I can create pin muxing and clock configurations:
Secure and Non-Secure
The SDK comes with demos using secure + non-secure application parts. To make it easy, the projects have TrustZone settings for the compiler and linker:
TrustZone Project Settings
I have started playing with TrustZone, but this is subject of a follow-up article.
Dealing with a ARM Cortex-M33 multicore device for sure is a bit more complex than just using an old-fashioned single Core M0+. Because of the secure and non-secure features, it might be necessary to get things back into a clean state. So this is what worked best for me:
Have a non-secure and simple project present in the workspace. I’m using the ‘led_blinky’ from the SDK examples.
Power the Board with IP5 USB connector (P5: cable with the yellow dot) and debug it with the onboard LPC-Link2 connector (P6).
LPC55S69 Power and Debug
With that project selected, erase the flash using the action in the Quickstart Panel.
Erase Flash Using Linkserver
Select core 0 for the erase operation:
Select core for Flash Erase
This should work without problems.PressOK the dialog:
At this point I recommend to disconnect and re-connect the P6 (Debug) cable.
Now I can program the normal application again:
With this I have a working and known state for my experiments.
The Easter break is coming to an end and has been interesting at least to say. The NXP LPC55S69-EVK is very appealing: the board is reasonably priced and with all the connectors it is a good way to evaluate the microcontroller. The most interesting thing is that it has a dual-core ARM-Cortex M33 with the ARM TrustZone implementation. To be able to run ‘trusted’ and ‘untrusted’ (e.g. user code) on the same device could be one of the standard models of microcontroller going forward, especially in the ‘internet of things’ area. So I think I have to explore this device and board and its capabilities in at least one follow-up article?
I really love tiny and bread board friendly boards, especially if they are very affordable and can be use with Eclipse based tools. So I was excited to see the NXP LPC845-BRK board to be available at Mouser, so I ended up ordering multiple boards right away. Why multiple? Because they only cost CHF 5.95 (around $6)!
NXP LPC845-BRK Board
The boards arrived yesterday, so it is a perfect timing to have them (and more of it) integrated into the next semester university course material. So you will probably see a few more tutorials for this board.
The kit comes in a solid card box with:
the LPC845-BRK board
two 10pin headers
Micro USB cable
a smalls screwdriver
two 2pin jumpers and headers
getting started reference card
The board works out of the box and does not need any soldering, and the headers are provided in case I want to customize the board. I like the fact that the headers are supplied, plus I’m free what I want to solder to the board. Plus I can use different headers if I want to. I was puzzled by the screwdriver (what for?) until I realized that there is small potentiometer on the board :-).
The main MCU on the board is the LPC845 in QFN48 package ( LPC845M301JBD4), an ARM Cortex-M0+, 30 MHz, 64 KB FLASH and 16 KB SRAM):
The board has a ‘break-apart’ touch area: if I don’t need it, I can make the board smaller. it includes a potentiometer, an RGB LED, three push buttons (Reset, user and ISP). Plus most important: the LPC11U35 acting as a debug probe:
I can use the LPC845 with an external debug probe: for this I have to solder a jumper plus the 2×5 header. All the three buttons can be used as user buttons, so technically there are three of them. There is as well a jumper for an ammeter to measure the current used.
Software and Tools
There is no dedicated MCUXpresso SDK for that board (yet?), so I have downloaded the one for the device from http://mcuxpresso.nxp.com/:
On the LPC845-BRK web site there is a zip file with examples which I have imported into the MCUXpresso IDE:
When plugged in, the board enumerates with a virtual COM port which is a gateway to the LPC845 UART:
I was able to debug the board out of the box, the board is recognized as CMSIS-DAP debug probe:
And voilà: I’m debugging it
I really like that board. It is of good quality with a lot of value. It has a on-board debugger and even the possibility to use it directly with a J-Link or P&E Multilink if I wish so. The board is small, can be hooked on a bread board and can be made even smaller with removing the touch pad. The Cortex-M0+ is not the fastest and biggest MCU on the planet, but provides enough processing power for many smaller applications. I plan to follow-up with more tutorials in the next days and weeks. Until then, see the tutorials listed in the Links section below.
You may be interested to know that we recently released a new set of debug firmware and Windows 7 drivers for our boards that feature the LPC11U3x MCU as a debug probe (so all the "MAX" boards). The new firmware can be found under the Software & Tools tab of the board page you are using http://www.nxp.com/demoboard/omxxxxx (where xxxxx is the board part number, such as om13071, om13097, etc.
The intention is for this firmware to be used instead of the Mbed-based firmware and driver that has been used up until now, if you are not going to use Mbed (you can continue to use the Mbed version if you so wish however). Some reasons to consider the new firmware & driver:
The CMSIS-DAP implementation is newer, so a little more robust and faster
The VCOM / serial port driver supports autobaud, with speeds up to 115200
The VCOM driver has a cleaner installation (mbed serial port driver needs board to be plugged in to install, which is a little unusual)
The firmware auto-detects if a target serial port connection is present and enumerates a driver if they are.
The new firmware gives a unique ID per board, allowing multiple board connections at once.
Downloading the package will give you a driver for Windows 7 & 8 (not needed for Windows 10, MacOS or Linux), plus the debug probe firmware image. Follow the firmware update instructions for your board to update - its a simple delete then drag and drop operation.
LPCXpert V3.4 is the latest release of a freeware expert tool for the CORTEX-M based LPC families of microcontrollers. This tool simplifies the selection of a MCU device, speeds up the creation of application code and initialization code and supports generation of an application specific schematic Symbol. This version supports more than 410 different CORTEX-M based micro controllers from NXP.
LPCXpert supports all phases of a development. During the MCU selection phase LPCxperts supports selection of a target MCU by providing selection features in the "MCU Select" tab. During the software implementation phase LPCXpert provides a graphical user-interface to configure the pinout (Pin-MUX) and the peripheral interfaces of the target device. LPCXpert then also generates projects providing a framework of reference applications. These applications configure the Clock Generation Unit (CGU) and the on-chip peripheral interfaces of the device to test and demonstrate the setup.
New and enhanced features include support for LPCopen software package from NXP. Features also include generation of a Schematic Symbol for the ALTIUM Designer and the CADSOFT EAGLE V6.2 and generation of projects for the NXP LPCXpresso and MCUxpresso IDE, IAR Embedded Workbench (EWARM), Keil µVision and GNU C-Compilers, as well as links to Internet Sites for additional information.
Using LPCXpert it is possible to set the pins of each peripheral (i.e. for SPI, CAN., I2C, EMC, ETH, ...) and to configure the features of each pin (Pull-Up, Pull-Down, ...). In addition LPCxpert V3.4 also supports configuration of pre-built demo code for the LPC8xx and LPC54xxx Families of MCUs.
Based on the configuration LPCXpert may generate a C-Code Project or a Schematic Symbol. In addition LPCxpert saves up to 8 different pin-mux configurations and restore from up to 10 different configurations. Additional Information and the download is available from the following Web-Site:
(Below is the detail information about the demo, same as in "readme.txt")
Overview ======== The Multicore blinky demo application demonstrates how to set up projects for individual cores on LPC5411x/10x dual-core system. In this demo, the M4 (master) releases the M0+ (slave) from the reset. Both M0+ and M4 share a global variable who is interpreted as LED control, bit 0 for LED 1, bit 1 for LED 2, bit 2 for LED 3. M4 side (Background): Initialize board, application logic, and boot M0+ by setting M0+'s main stack, reset handler, and release M0+'s reset flag Enter main loop. In main loop, M4 do below things every 20000 cycles. 1. Turn on green LED 2. Try lock the hardware mutex 3. Toggle red LED control bit, delay for sometime 4. Set M0+'s mailbox to the address of LED control variable, this will trigger M0+'s mailbox IRQ. 5. deliberately delay for a long time to simulate complex software execution. 5. Release mutex M4 side (IRQ context) In M4's mailbox IRQ handler (M0+ triggers it by writing non-zero to M4's mailbox), Update LED states according to LED control variable value
M0+ side (Background) 1. After first POR or pin reset, M0+ runs M4/M0+ shared reset handler, and finds startup condition is not yet set, so go to sleep 2. After M0+ is reboot by M4 again sometime later, M4 already prepared M0+'s startup condition, so shared reset handler detect it and jump to M0+'s app reset handler according to M4's settings 3. M0+ app's reset handler do basic initialization and jumps to M0+'s main() 4. In main(), M0+ just enables mailbox IRQ then enter main loop, the main loop is empty M0+ side (IRQ context) In parallel, once M4 write non-zero (in our case, the address of LED control variable) to M0+'s mailbox, M0+'s mailbox IRQ triggered. In mailbox IRQ handler: 1. Get the address of LED control variable, 2. try lock the hardware mutex, note that since M4 delibrately delay before releasing mutex, this try loop will cycle many times before successfully lock it. 3. toggles blue LED control bit , and write non-zero to M4's mailbox to trigger M4's mailbox IRQ.
Manual control Press and hold down the SW1 button to hold M0 in reset state, release SW1 to release M0 from reset (M0 will restart). Press and hold down SW2 button to prevent M4 from releasing hardware mutex, thus pauses the blinking Toolchain supported =================== - (Coming later) IAR embedded Workbench 7.80.2 - Keil MDK 5.21a
Hardware requirements ===================== - Mini/micro USB cable - LPCXpresso54114 board - Personal Computer
Board settings ============== No special settings are required.
Prepare the Demo ================ 0. How to build Open workspace (for KEIL, "boards\lpcxpresso54114\multicore_examples\blinky\mdk\blinky.uvmpw") First build M0+ project, this will generate "core1_image.bin", which is M0+'s image bin file, M4 includes it as one assembly data section. Then build M4 project, and download to flash. Note: Do NOT try downloading M0+ build to flash, M0+ image is managed by M4. 1. Connect a micro USB cable between the PC host and the CMSIS DAP USB port (J7) on the board 2. (Optional) Open a serial terminal with the following settings (See Appendix A in Getting started guide for description how to determine serial port number): - 115200 baud rate - 8 data bits - No parity - One stop bit - No flow control 3. Download the M4 project to the target board. 4. Either press the reset button on your board or launch the debugger in your IDE to begin running the demo.
Running the demo ================ After reset, the red and blue LEDs alternatively turned on and off, and green LED blink for a short time before each switch between red and blue. Green LED shows the period during which M4 holds h/w mutex. During it runs, 1. if you hold SW1 button down (M0 hold reset), the switch pauses and green LED keeps blinkying, and either red or blue LED is always on; after you release SW1, red and blue LED may turned on and off alternatively or altogether. 2. if you hold SW2 button down (M4 does not unlock mutex), the blink is all paused, after you release SW2, the blink resumes like before. If you connect serial terminal, trace logs will be printed when you have button actions and when M4 take/give h/w mutex.
LPCXpert V3.3 is the latest release of a freeware expert tool for the NXP CORTEX-M based LPC families of microcontrollers. This tool simplifies the selection of a MCU device, speeds up the creation of application code and initialization code and it supports generation of an application specific schematic symbol.
This version supports about 400 different CORTEX-M based micro controllers from NXP.
LPCxpert supports all phases of a development. During the MCU selection phase LPCxperts supports selection of a target MCU by providing selection features in the "MCU Select" tab. During the software implementation phase LPCxpert provides a graphical user-interface to configure the pinout (Pin-MUX) and the peripheral interfaces of the target device. LPCxpert also generates a framework of executable code that configures the Clock Generation Unit (CGU) and the peripheral interfaces of the device.
New and enhanced features include support for LPCopen software package from NXP. Features also include generation of a Schematic Symbol for the ALTIUM Designer and the CADSOFT EAGLE V6.2 and generation of projects for IAR Embedded Workbench (EWARM), Keil µVision and GNU C-Compilers, as well as links to Internet Sites for additional information.
Using LPCXpert it is possible to set the pins of each peripheral (i.e. for SPI, CAN., I2C, EMC, ETH, ...) and to configure the features of each pin (Pull-Up, Pull-Down, ...). In addition LPCxpert V3.2 also supports configuration of pre-built demo code for the LPC8xx and LPC54xxx Families of MCUs.
Based on the configuration LPCXpert may generate a C-Code Project or a Schematic Symbol. In addition LPCxpert saves up to 8 different pin-mux configurations and restore from up to 10 different configurations.
Additional Information and the download is available from the following Web-Site: