Skip navigation

I had some design updates for “Rev B” of my Mini-Monkey design that I wanted to get in the "queue" for testing.  For the next revision, I wanted to try PCB:NG for the board fabrication and assembly.  PCB:NG is an “on-demand” PCB assembly service focused on turnkey prototypes via simple a web interface.    The pricing looked attractive and it appeared that the Mini-Monkey fit within their standard design rules.    The Mini-Monkey design uses an NXP LPC55S69 microcontroller that is in a 0.5mm pitch VFBGA98 package.   NXP offers guidance on how to use this device with low-cost design rules and I thought this would be a great test for PCB:NG.   I had success with Rev A at Macrofab and thought I would give PCB:NG a shot.


Getting your design uploaded is straightforward with PCB:NG.   You can upload your Gerber files and get a preview of the PCB.  As you move through the process, the web interface will give you an updated price:


Figure 1: PCB:NG Gerber Upload


The online PCB:NG interface includes a Design For Manufacture (DFM) check.  The check is exhaustive and includes all the common DFM rules such as trace width, clearance, drill hits etc.  In my case, I had some features that violated minimum solder mask slivers and copper to board outline clearances.  The online tool allows you to “ignore” DFM violations that may not be an issue.   I was able to look through all the violations and mark which ones were of no concern.


Once the Gerber files are uploaded, you can add your parts and as well as the pick/place data.  The PCB:NG interface will show you part pricing and availability as soon your Bill of Materials (BOM) is uploaded.   You have the option to mark parts as Do Not Place (DNP) if you do not want them populated.   In my case, I had 2 components on the Mini-Monkey BOM (a battery and a display) that I did not include as they required some manual assembly steps that I was going to perform once I had the units in hand.



Figure 2 : PCB:NG BOM Upload


Along with the BOM, you must  upload XYRS placement data.    The XYRS data can be combined in the spreadsheet file used for the BOM.  The PCB:NG viewer will also show you where it thinks all the placements are and can make manual adjustments if necessary.



Figure 3 : PCB:NG Part Placement Interface




I had placed my order on 2020-06-10.   Throughout the process, PCB:NG sent email updates when materials were in house,  when production started, etc.    I did have to send in a note that one of the parts (a MEMs microphone) was sensitive to cleaning  processes.    I received a response the same day noting the exception (PCB:NG uses a no-clean process) and they would add the part to their internal database of exceptions.


I had placed the order when they were in the middle of some equipment upgrades.   When I checked the price a few day ago I found that it was lower ($ 380 vs $496) after the new process upgrades.    I consider the service a huge value given that they handle some potentially difficult parts.   Getting the BGA packages microcontroller and the LGA packaged MEMs soldered professionally was well worth the price.   The boards shipped out 2020-06-29.  It was a bit longer than the published lead time but communication during the process was good.  I think I caught the team in the middle of some equipment upgrades which may have delayed things a bit.   PCB:NG took some extra time to get me photos from the X-tay inspection of the BGA and LGA parts.     Getting these photos was well worth the wait!



Figure 4: LPC55S69 VFBGA98 Post Assembly X-ray - View 1



Figure 5: LPC55S69 VFBGA98 Post Assembly X-ray – View 2



Figure 6: MEMS Microphone (LGA) Post Assembly X-ray


As you can see of the X-ray images, the solder joints were good.   It was also cool seeing the via structures in the PCB and bond wires in the IC packages.   You can even see tiny little via structures in the VFBGA98 package itself.   How did the build turn out?   Here is a video of the Mini-Monkey Rev B:


Mini-Monkey Rev B w/ PCB:NG 


PCB:NG was also kind enough to show the Mini-Monkeys getting setup for placement:


Mini-Monkey Fabrication at PCB:NG 


Final Thoughts


The experience with PCB:NG was excellent.   The boards turned out a great and I was able to test all my changes quickly.    Having someone else handle part procurement and assembly is a huge value to me as it allows me to focus on other aspects of the design such as firmware develop for the board bring-up.    One possible improvement with the online PCB:NG interface would be to be able to submit ODB++ or IPC-2581 data.  These formats bake in more information and could really streamline design upload.       I will certainly be using PCB:NG in the future for my prototypes.    The on-demand model is helpful, especially when you are busy and need to get some help accelerating your development efforts.   


Onward to Revision C!    I think I may add eMMC storage and improve the battery circuit.   If you want to see the current raw design files,   they are available on BitBucket in Altium Designer format.




Test Software: 


I'll be posting more updates on the Mini-Monkey as new revisions are complete.   In the mean time, here are other articles and videos related to the LPC55S69. Cheers!

In my last article, we examined a common time domain filter called the Biquad and how it could be computed using the LPC55S69 PowerQuad engine.        We will now turn our attention to another powerful component of the PowerQuad, the “Transform Engine”.      The PowerQuad transform engine can compute a Fast Fourier Transform (FFT) in both a power and time efficient manner leaving your main CPU cores to handle other tasks.  


Before we look at the implementation on the LPC55S69, I want to illustrate what exactly an FFT does to a signal.   The meaning of the data is often glossed over or even worse yet, explained in purely mathematical terms without a description of *context*.    I often hear descriptions like “transform a signal from the time domain to the frequency domain”.   While these types of descriptions are accurate, I think that many do not get an intuitive feel for what the numbers are *mean*.    I remember my 1st course in ordinary differential equations.  The professor was explaining the Laplace transform (which is more generalize case of the Fourier Transform) and I asked the question “What does s actually mean in a practical use case?”.


Figure 1. Laplace Transform. What is s?


My professor was a brilliant mathematician and a specialist in complex analysis.  He could explain the transform from 3 different perspectives with complete mathematical rigor.       Eventually we both got frustrated and he said “s will be a large number complex number in engineering applications”.   It turned out the answer was simple in terms of the electrical engineering problems we were solving.  After many years and using Laplace again in Acoustic Grad School it made sense but at the time it was magical.       I hope to approach the FFT a bit differently and you will see it is simpler than you may think.     While I cannot address all the aspect of using FFT’s in this article, I hope it gives you a different perspective from a “getting started” perspective.


Rulers, Protractors, and Gauges


One of my favorite activities is wood working.   I am not particularly skilled in the art, but I enjoy using the tools, building useful things, and admiring the beauty of the natural product.  I often tell people that “getting good” at wood working is all about learning how to measure, gauge and build the fixtures to carry out an operation.      When you have a chunk of wood, one of the most fundamental operations is to measure its length against some fixed standard.      Let us begin with a beautiful chunk of Eastern Hemlock:

Figure 2.  A 12” x 3” x 26” Piece of rough sawn eastern hemlock. 


One of the first things we might want to do to with this specimen is use some sort of standard gauge to compare it to:


Figure 3.  Comparing our wood against a reference.


We can pick a standard unit and compare our specimen to a scale based upon that unit.   In my case the unit was “inches” of length, but it could be anything that helps up solve our problem at hand.  Often you want to pick a unit and coordinate system that scales well to the problem at hand.     If we want to measure circular “things”, we might use a protractor as it makes understanding the measurement easier.    The idea is to work in a system that is in the “coordinate system” of your problem.     It makes sense to use a ruler to measure a “rectangular” piece of wood.


What does this have to do with DSP and Fourier transforms?   I hope to show you that a Fourier Transform (and its efficient discrete implementation, the FFT) is simplly just a set of gauges that can be used to understand a time domain signal.   We can then use the PowerQuad hardware to carryout out the “gauging”.  For the sake of this discussion, let us consider a time domain signal such as this:


Figure 4.   An example time domain signal


This particular signal is a bit more complex than the simple sine wave used in previous articles.     How exactly would we “gauge” this signal?   Amplitude?  Frequency?  Compute statistics such as variance?    Most real-world signals can have quite a bit of complexity, especially when they are tied to some physical process.    For example, if we are a looking at the result of some sort of vibration measurement, the signal could look very complicated as there are many physical processes contributing to the shape.   In vibration analysis, the physical “things” we are examining with move and vibrate according to well understood physics.     The physics show that the systems can be modeled with even order differential equations.       This means always be we can write the behavior of the system over time as the sum of sinusoidal oscillations.        So, what would be a good gauge to use to examine our signal?    Well, we could start with a cosine wave at some frequency of interest:


Figure 5.   Gauging our signal against a cosine wave.


Choosing a cosine signal as a reference gauge can simplify the problem as we can easily identify the properties of our unit of measure,  i.e. its frequency and amplitude.      We can fix the amplitude and frequency of our reference and then compare it to our signal.     If we do our math correctly, we can get a number that indicates how well correlated our input signal is to a cosine wave of a particular frequency and a unit amplitude.     So, how exactly do we perform this correlation?     It turns out to be a simple operation.    If we think about the input signal and our reference gauge as discrete arrays of numbers (i.e. vectors),  we compute the dot-product between them:


Figure 6.   Computing the correlation between a test signal and our feference gauge.


The operation is straightforward.    Both your signal input and the “gauge” has the same number of samples.   Multiply the elements of each array  together and add up the results.    Using an array like notation, the input is represented by “x[n]” and the gauge is represented by “re[n]” where n is an index in the array:


Output = x[0]*re[0]  +  x[1]*re[1]  + x [2]*re[2] + .  .  .


What we end up with is a single number (scalar).   Its magnitude is proportional to how well correlated out signal is to the particular gauge we are using.      As a test, you could write some code and use a cosine wave as your input signal.  The test code could adjust the frequency of input and as the frequency of the input gets closer to the frequency of the gauge, the magnitude of the output would go up.


As you can see the math here is just a bunch of multiplies and adds, just like the IIR filter from our last article.     There is one flaw however with this approach.      There is special case of the input where the output will be zero.   If the signal input is a cosine wave of the *exact* frequency as the gauge and  is 90 degrees phase shifted with respect to the reference gauge, we would get a zero output.    


Figure 7.  A special case of our reference gauge that would render zero output.


This is not desirable as we can see that input is correlated our reference gauge, it just is shifted a in time.  There is a simple fix and we can even use our piece of hemlock lumber to illustrate.


Figure 8.  Gauging along a different side of the lumber.


In Figure 3, I showed a ruler along the longest length of the wood.     We can also rotate the ruler and measure along the shorter side. It is the same gauge, just used a different way. Imagine that board was only 1” wide but 24” long.  I could ask an assistant to use a ruler and measure the board. Which of those two numbers is “correct”?    The assistant could report to me either of those numbers and be technically correct.   We humans generally assume length to be the longer side of a rectangular object but there is nothing special about that convention.  In figure 6, we were only measuring along 1 “side” of the signal.   It is possible to get a measurement that is zero (or very small) while have a signal that looks very similar to the gauge (like in figure 7).   We can fix this by “rotating” our ruler similar to figure 8 and measure along the both ”sides” of the signal.


Figure 9.  Using two reference gauges.  One is “rotated” 90 degrees.


In figure 9, I added another “gauge” labeled “A” in purple.  The original gauge is labeled “B”.    The only difference between the two gauges is that B is phase shifted by 90 degrees.   This is equivalent to rotating my ruler in figure 8 and measuring the “width” of my board.     In figure 9, I am showing 3 of the necessary multiply/add operations but you would carry out the multiple/add for all points in the signal. Writing it out:


B = x[0]*Re[0]  +  x[1]*Re[1]  + x [2]*Re[2] +  .  .  .

A = x[0]*Im[0]  +  x[1]*Im[1]  + x [2]*Im[2] +  .  .  .


In this new formulation we get a pair of numbers A,B for our output.   Keep in mind that we are gauging our input against a *single* frequency of reference signals at a unit amplitude.     This is analogous to measuring the length and width of our block of wood.      Another way of thinking about it is that we now have a measuring tool that evaluates along 2 axes which are “orthogonal”.     It is almost like a triangle square.

Figure 10.  A two-axis gauge.


Once we have our values A & B, it is typical to consider them as a single complex number


Output = B + iA


The complex output gives us a relative measure of how we are correlated to our reference gauges. To get a relative amplitude, simply compute the magnitude:


||Output|| = sqrt(A^2 + B^2)


You could even extract the phase:


Phase = arctan(B/A)


It common to think about the output in “polar” form (magnitude/phase).  In vibration applications you typical want understand the magnitude of the energy at different frequency components of a signal. There are applications in communications, such as orthogonal frequency domain multiplexing (OFDM), where you work directly the with real and imaginary components. 


I previously stated that the correlation we were performing is essentially a vector dot product operation.    The dot product shows up in many applications. One of which is dealing with vectors of length 2 where we use the following relationship:



The interesting point here is that the dot product is a simple way of getting a relationship of the angle and magnitude between two vectors and b.    It is easy to think about a and b as vectors on a 2d plane, but the relationship extends to vectors of any length.  For digital data, we work with discrete samples, so we define everything in terms of the dot product.   We are effectively using this operation to compute magnitudes and find angles between “signals”.   In the continuous time world, there is the concept of the inner-product space.   It is the “analog” equivalent of the dot product and underpins the mathematical models for many physical systems.  


At this point we could stop and have a brute force technique of comparing a signal against a single frequency reference.    If we want to determine if a signal had a large component of a particular frequency, we could tailor our reference gauges to the *exact* frequency we are looking for.  The next logical step is to compare our signal against a *range* of reference gauges of different frequencies:


Figure 11: Using a range of reference gauges at different frequencies.


In Figure 11, I show four different reference gauges at frequencies that have an integer multiple relationship. There is no limit to the number of frequencies you could use.     With this technique, we can now generate a “spectrum” of outputs at all the frequencies of interest for a problem.    This operation has a name:  the Discrete Fourier Transform (DFT).    One way of writing the operation is:

Figure 12. The Discrete Fourier Transform (DFT)


N is the number of samples in the input signal.


k is the frequency of the cosine/sine reference gauges.     We can generate a “frequency” spectrum by computing DFT over a range of “k” values.  It is common to use a linear spacing in when selecting the frequencies.   For example, if your sample rate is 48KHz and you are using N=64 samples, it is common (we will see why later) to use 64 reference gauges spaced at (48000/64)Hz apart. 

The “Fast Fourier Transform”


The Fast Fourier Transform is a numerically efficient method of computing the DFT.  It was developed by J. W. Cooley and John Tukey in 1965 as a method of performing the computation with a fewer adds and multiplies as compared to the direct implementation shown in Figure 11.    The development of the FFT was significant as we can do our number crunching much more efficiently by imposing a few restrictions on the input.  There are a few practical constraints that need to be considered when using an FFT implementation


  1. The length of your input must be a power of 2.   i.e. 32, 64, 128, 256.
  2. The “bins” of the output are spaced in frequency by the sample rate of your signal divided by the number of samples in the input.     As an example, if you have a 256-point signal sampled at 48Khz, the array of outputs corresponds to frequencies spaced at 187Hz.      In this case the “bins” would correlate to 0Hz, 187.5Hz, 375 Hz, etc.  You cannot have arbitrary input lengths or arbitrary frequency spacing in the output.
  3. When the input the FFT/DFT are “real numbers” (i.e. samples from an ADC), the array of results exhibits a special symmetry.   Consider an input array of 256 samples.    The FFT result will be 256 complex numbers.   The 2nd half of the output are a “mirror” (complex conjugates) of the 1st half.      This means that for a 256-sample input, you get 128 usable “bins” of information. Each bin has a real and imaginary component.  Using our example in #2, the bins would be aligned to 0Hz, 187.5Hz, 375Hz, all the way up to one half of our sample rate (24KHz).


You can read more details about how the FFT works as well as find plenty of instructional videos on the web. Fundamentally, the algorithm expresses the DFT of signal length N recursively in terms of two DFTs of size N/2.  This process is repeated until you cannot divide the intermediate results any further.   This means you must start with a power of 2 length.  This particular formulation is called the Radix-2 Decimation in Time (DIT) Fast Fourier Transform.  The algorithm gains its speed by re-using the results of intermediate computations to compute multiple DFT outputs.  The PowerQuad uses a formulation called “Radix-8” but the same principles apply.


Using the PowerQuad FFT Engine


The underlying math to a DFT/FFT boils down to multiplies and adds along with some buffer management.   The implementation can be pure software, but this algorithm is a perfect use case for a dedicated coprocessor.    The good news is that once you understand the inputs and outputs of a DFT/FFT, using the PowerQuad is quite simple and you can really accelerate your particular processing task. The best way to get started with using the PowerQuad FFT is to look at the examples in the SDK.     There is an example project called “powerquad_transform” which has examples that test the PowerQuad hardware.


Figure 13.  PowerQuad Transform examples in the MCUXpresso SDK for the LPC55S69


In the file powerquad_transform.c, there are several functions that will test the PowerQuad engine in its different modes.     For now, we are going to focus on the function PQ_RFFTFixed16Example(void).



This example will set up the PowerQuad to accept data in a 16-bit fixed point format.      To test the PowerQuad, a known sequence of input and output data is used to verify results.     The first thing I would like to point out is that the PowerQuad transform engine is used fixed point/integer processing only.   If you need floating point, you will need to convert beforehand.  This is possible with the matrix engine in the PowerQuad.      I personally only every use FFTs with fixed point data most of my source data comes right from analog to digital converter data.       Because of the processing gain of the FFT, I have never seen any benefit of using a floating-point format for FFTs other than some ease of use for the programmer.   Let us look at the buffers used in the example:



Notice that the input data length FILTER_INPUT_LEN (which is 32 samples).   The arrays used to store the outputs are twice the length.     Remember that an FFT will produce the same number of *complex* samples in the output as there are samples for the input.     Since our input sample are real values (scalars) and the outputs have real/imaginary components, it follows that we 2x the length to storage the result.     I stated before that one of implications of the FFT with real valued inputs is that we have a mirror spectrum with complex conjugate pairs.   Focusing on the reference for testing the FFT output in the code:



The 1st pair 100,0 corresponds to the 1st bin which is a “DC” or 0Hz component.  It should always have a “zero” for the imaginary component.    The next bins can be paired up with bins from the opposite end of the data:


76,-50  <->   77,49

29,-62   <->   29, 61

-1, -34   <->  -1,33


These are the complex conjugate pairs exhibiting mirror symmetry.   You can see that they are not quite equal.  We will see why in a moment.  After all the test data in initialized, there is a data structure used to initialize the PowerQuad:


One of the side effects of computing an FFT is that you get gain at every stage of the process.   When using integers, it is possible to get clipping/saturation and the input needs to be downscaled to ensure the signal down not numerically overflow during the FFT process.     The macro FILTER_INPUTA_PRESCALER is set to “5”.    This comes from the length of the input being 32 samples or 2^5.     The core function of the Radix-2 FFT is to keep splitting the input signal in half until you get to a 2-point DFT.  It follows that we need to downscale by 2^5 as we can possible double the intermediate results at each stage in the FFT.       The PowerQuad uses a Radix-8 algorithm, but the need for downscaling is effectively the same.      I believe that some of the inaccuracy we saw in the complex conjugates pairs the test data was from the combination of an input array values that are numerical small and the pre-scale setting.   Note that the pre-scaling is a built in hardware function of the PowerQuad.


The PowerQuad needs an intermediate area to work from.  There is a special 16KB region starting at address 0xe0000000 dedicated to the PowerQuad.   The PowerQuad has a 128-bit interface to this region so it is optimal to use this region for the FFT temporary working area.  You can find more details about this private RAM in AN12292 and   AN12383


Once you configure the PowerQuad, the next step is to tell the PowerQuad the input and result data is stored with the function PQ_transformRFFT().



Notice in the implementation of the function, all that is happening is setting some more configuration registers over the AHB bus and kicking off the PowerQuad with a write to the CONTROL register. In the example code, the CPU blocks until the PowerQuad is finished and then checks the results.    It is important to point out that in your own application,  you do not have to block until the PowerQuad is finished.      You could setup an interrupt handler to flag completion and do other work with the general purpose M33 core.  Like I stated in my article on IIR filtering with the PowerQuad,   the example code is a good place to start but there are many opportunities to optimize your particular algorithm.   Example code tends to include additional logic to check function arguments to make the initial experience better.    Always take the time look through the code to see where you can remove boilerplate that might not be useful.


Parting Thoughts


  • The PowerQuad includes a special engine for computing Fast Fourier Transforms.
  • The FFT is an efficient implementations of the Discrete Fourier Transform.   This process just compares a signal against a known set of reference gauges (Sines and Cosines)
  • The PowerQuad has a private region to do its intermediate work.  Use it for best throughput.
  • Also consider the memory layout and AHB connections of where your input and output data lives.  There may be additional performance gains by making sure you input DSP data is in a RAM block that is on a different port than RAM used in your application for general purpose task.  This can help with contention when different processes are accessing data.   For example, SRAM0–3 are all on different AHB ports.   You might consider locating you input/output data in SRAM3 and having your general-purpose data in SRAM0-2.   Note:   You still need to use 0xE0000000 for the PowerQuad TEMP configuration for its intermediate working area.


At this point you can begin looking through the example transform code.  Also make sure to read through  AN12292 and   AN12383 for more details.       While there are more nuances and details to FFT and “frequency domain” processing, I will save those for future articles.    Next time I hope to show some demos of the PowerQuad FFT performance on the Mini-Monkey and illustrate some other aspect of the PowerQuad.      Until then, check out some of the additional resources below on the LPC55S69.


Additional LPC55S69 resources:

In my last article,   we starting discussing the PowerQuad engine in the LPC55S69 as well as the concept of data in the “time domain”.    Using the Mini-Monkey board, we showed the function of collecting a bucket of data over time.     I chose to use a microphone as a data source as it is easy to visualize and understand.      You can now easily imagine replacing the microphone with *anything* that changes over time.     In this article we are going to look at some common algorithms for processing data in the time domain.  In particular, we will look at the “Dual Biquad IIR” engine in the LPC55S69 PowerQuad. An IIR biquad is a commonly used building block as it is possible to configure the filter for many common filtering use cases.   This article is not intended to review all of the DSP theory behind IIR filter implementations but I do want to highlight some key points and the PowerQuad implementation.


Digital Filtering with Embedded Microcontrollers


When sampling data “live”, one can imagine data being continuously recorded at a known rate.      A time domain filter will accept this input data and output a new signal that is modified in some way.


Figure 1.   Filtering In the Time Domain


The concept here is that the output of the filter is just another time domain signal.   You may choose to do further processing on this new signal or output to a Digital Analog Converter (DAC).      If we are thinking in terms of “sine waves”, a digital filter adjusts the amplitude and phase of the input signal.     As we apply different frequency inputs (or a sum of different frequencies), the filter attenuates or gains to the sinusoidal components.   So, how does one compute a digital filter?   It is quite simple.   Let us start with a simple case.  :

Figure 2.   Sample by Sample Filter Processing using a History of the Input


One operation we perform is to *mix* the most recent input sample with samples we have previously recorded.    The result of this operation is our next *output* sample. The name of this filter configuration is an FIR or Finite Impulse Response filter.      One way to write this algorithm is to use a “c array style” notation and difference equations.


x[n]       The current input

x[n-1]     Our previous input

y[n-2]     An input from 2 sample ago 

y[n]       Our next output


Figure 2 could be written as


y[n] = b0*x[n] + b1*x[n-1]  + b2*x[n-2]


All we are doing is multiplying our input sample and its history by constant coefficients and then adding them up.    We are multiplying then accumulating! The constants b0, b1 and b2 control the frequency response of the filter.    By choosing these numbers correctly, we can attenuate “high” frequencies (low pass filter), attenuate low frequencies (high pass filter), or perform some combination of the two (band pass filter).     We can also use more samples from the input history.    For example, instead of just using the previous 3 samples, one could use 128 samples.      A filter of this type (FIR) can require quite a bit of time history to get precise control over its frequency response.         The code to implement this structure is simple but can be very CPU intensive as you need to do the multiply and adds for *every* sample at your signal sample rate.  


There is an adjustment we can make to figure 2 that can allow for tighter control over our frequency response without having to use a long time history.


Figure 3.   Sample by Sample Filter Processing using a History of the Input and Output


The key difference between figure 2 and figure 3 is that we can also mix in previous filter *outputs* to generate the output signal.     Adding this “feedback” can yield some interesting properties and is the root of another class of digital filters called IIR (Infinite Impulse Response filters).    


y[n] = b0*x[n] + b1*x[n-1]  + b2*x[n-2]  + a1*y[n-1] + a2*y[n-2]


One of the primary advantages of this approach that you need fewer coefficients than an FIR filter structure to get a desired frequency response.   There are always trade-offs when using IIR filters vs. FIR filters so be sure to read up on the differences.    The example I showed in figure 3 is called a “biquad”.     A biquad filter is a common filter building block that can be easily cascaded to construct larger filters.       There are several reasons to use a biquad structure, one of which being that there are many design tools that can generate the coefficients for all of the common use cases.      Several years ago, I built a tool around a set of design equations that were useful for audio filtering.


Figure 4.  An IIR Biquad Filter Design Tool.


At the time I made the tool shown in figure 4, I was using biquad filter structures for tone controls on a guitar effects processor.     The frequency and phase response plots where designed to show frequencies of interest of an electric guitar pickup.  There are lots of options for coming up with coefficients and numerous libraries to help.  For example, you could use Python:


In my guitar effects project, I embedded the filter design equations in my C code so I could recompute coefficients dynamically!


Using the PowerQuad IIR Biquad Engines


The PowerQuad in the LPC55S69 has dedicated hardware to compute IIR biquad filters.      Like an FIR filter, the actual code to implement a biquad filter is straightforward.  An IIR filter may be simple to code but can use quite a bit of CPU time to crunch through all the multiply and accumulate operations.     The PowerQuad is available to free up the CPU from performing the core computational component of the biquad computation.       A good starting point for using the PowerQuad IIR biquad engine is to use the MCUXpresso SDK.     It is important to note that the SDK will be a starting point.  The SDK code is written to cover as many use cases as possible and to demonstrate the different functions of the PowerQuad.  It can be helpful to read through the source code and decide which pieces you need to extract for your own application.   DSP code often requires some hand tuning and optimization for a particular use case.    The PowerQuad is connected via the AHB bus and the Cortex-M33 co-processor interface.  Let’s take a look at the SDK source code to see how you the IIR engine works.


Using the “Import SDK Examples” wizard in MCUXpresso, you will find PowerQuad examples under driver_examples PowerQuad


Figure 5.  Selecting the PowerQuad Digital Filter Example


The powerquad_filter project has quite a few examples of the different filter configurations.  We are going to focus on a floating point biquad example as a starting point.  In the file powerquad_filter.c, there are several test functions that will demonstrate a basic filter setup.   I am using LPC55S69 SDK 2.7.1 and there is function around line 455 (Note the spelling mistake PQ_VectorBiqaudFloatExample).


Figure 6.  Vectorized Floating Point IIR Filter Function


The 1st important point to note is that PowerQuad computes IIR filters using “Direct Form II”.        In the previous figures I showed the filter using “Direct Form I”.      When one is 1st introduced to IIR filters, “Direct Form I” is the natural starting point as it is the clearest and most straightforward implementation.    It is possible however to re-arrange the flow of multiplies and adds and get the same arithmetic result.


Figure 7.  IIR Direct Form II



When using "Direct Form II", we do not need to store history of both inputs and outputs.  Instead, we store an intermediate computation which is labeled v[n].         During the computation of the filter, the intermediate history v[n] must be saved.    We will refer these intermediate values as the filter “state”.  To setup the PowerQuad for IIR filter operation, there are handful of registers on the AHB bus where the state and coefficients are stored. In the SDK examples, the state of the filter is initialized with PQ_BiquadRestoreInternalState().       


Figure 8.   Restoring/Initializing Filter State


Once the PowerQuad IIR engine is initialized,  data samples can be processed through the filter.   Let us take a look at the function PQ_VectorBiqaudDf2F32() in fsl_powerquad_filter.c


Figure 9.   Vectorized IIR Filter Implementation.


This function is designed to process longer blocks of input samples, ideally in multiples 8.       Note that many of the SDK examples are designed make it simple to get started but could be easily tuned to remove operations that may be not applicable in your application code.  For example, the modulo operation to determine if the input block is a multiple of 8 is something that could be easily removed to save CPU time.         In your application, you have complete control over buffer sizes and can easily optimize and remove unnecessary operations.  The actual computation of the filter can be observed in the code block that processes the 1st block of samples.


Figure 10.  Transfering Data to the IIR Engine with the ARM MCR Coprocessor Instruction


Data is transferred to the PowerQuad with the MCR instruction.   This instruction transfers data from an CPU register to an attached co-processor (the PowerQuad in this case).  The PowerQuad does the work of crunching through the Direct Form II IIR structure.    While it take some CPU intervention to move data into the PowerQuad,   the PowerQuad is much more efficient at the multiply and adds for the filter implementation.


To get the result, the MRC instruction is used.   MRC moves data from a co-processor to a CPU register.


Figure 11.  Retrieving the IIR Filter result with the MRC instruction.


Further down in PQ_VectorBiquadDf2F32(), there is assembly code tuned to inject data in blocks of 8 samples.    Looking at PQ_Vector8BiquadDf2F32():


Figure 12.  Vectorized Data Insertion into the PowerQuad.


Notice all the MCR/MRC functions to transfer data in and out of the biquad engine.    All the other instructions are “standard” ARM instructions to get data into the registers that feed coprocessor.  Take some time to run the examples in the SDK.  They are structured to inject a known sequence to verify correct filter operation.    Now that you have seen some the of the internals,  you can use the pieces you need from the SDK to implement your signal processing chain.


Some take-aways


  • The PowerQuad can help accelerate biquad filters. There are 2 separate biquad engines built into the PowerQuad.
  • The PowerQuad IIR functions are configured through registers on the AHB bus and the actual input/output samples transferred through the Cortex M33 coprocessor interface.
  • The SDK samples are a good starting point to see how configure and transfer data to the PowerQuad.  There are optimization opportunities for your particular application so be sure to inspect all of the code.
  • If you need more than two biquad filters, you will need to preserve the “state” of the filter.  This can be a potentially expensive operation if you are constantly saving/restoring state.  In this case you will want to consider processing longer blocks of data.
  • You may not need to save the entire “state” of the filter.   For example, if the filter coefficients are the same for all of the your filters,  all you need to save and restore is v[n].
  • While the PowerQuad can speed up (6x) the core IIR filter processing,  you still need the CPU to setup the PowerQuad and feed in samples.   Consider using one the extra Cortex M33 cores in the LPC55S69 to do your data shuffling.


You now have a head start on performing time domain filtering with the LPC55S69 PowerQuad.    We examined IIR filters, which have lots of applications in audio and sensor signal processing, but the PowerQuad can also accelerate FIR filters.  Next time we are going to dive a litter deeper with some frequency domain processing with the PowerQuad transform engine.      The embedded transform engine can accelerate processing of Fast Fourier Transforms *significantly*. Stay tuned for more embedded signal processing goodness!


Additional LPC55S69 resources: