In my last article, we starting discussing the PowerQuad engine in the LPC55S69 as well as the concept of data in the “time domain”. Using the Mini-Monkey board, we showed the function of collecting a bucket of data over time. I chose to use a microphone as a data source as it is easy to visualize and understand. You can now easily imagine replacing the microphone with *anything* that changes over time. In this article we are going to look at some common algorithms for processing data in the time domain. In particular, we will look at the “Dual Biquad IIR” engine in the LPC55S69 PowerQuad. An IIR biquad is a commonly used building block as it is possible to configure the filter for many common filtering use cases. This article is not intended to review all of the DSP theory behind IIR filter implementations but I do want to highlight some key points and the PowerQuad implementation.
Digital Filtering with Embedded Microcontrollers
When sampling data “live”, one can imagine data being continuously recorded at a known rate. A time domain filter will accept this input data and output a new signal that is modified in some way.
Figure 1. Filtering In the Time Domain
The concept here is that the output of the filter is just another time domain signal. You may choose to do further processing on this new signal or output to a Digital Analog Converter (DAC). If we are thinking in terms of “sine waves”, a digital filter adjusts the amplitude and phase of the input signal. As we apply different frequency inputs (or a sum of different frequencies), the filter attenuates or gains to the sinusoidal components. So, how does one compute a digital filter? It is quite simple. Let us start with a simple case. :
Figure 2. Sample by Sample Filter Processing using a History of the Input
One operation we perform is to *mix* the most recent input sample with samples we have previously recorded. The result of this operation is our next *output* sample. The name of this filter configuration is an FIR or Finite Impulse Response filter. One way to write this algorithm is to use a “c array style” notation and difference equations.
x[n] The current input
x[n-1] Our previous input
y[n-2] An input from 2 sample ago
y[n] Our next output
Figure 2 could be written as
y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2]
All we are doing is multiplying our input sample and its history by constant coefficients and then adding them up. We are multiplying then accumulating! The constants b0, b1 and b2 control the frequency response of the filter. By choosing these numbers correctly, we can attenuate “high” frequencies (low pass filter), attenuate low frequencies (high pass filter), or perform some combination of the two (band pass filter). We can also use more samples from the input history. For example, instead of just using the previous 3 samples, one could use 128 samples. A filter of this type (FIR) can require quite a bit of time history to get precise control over its frequency response. The code to implement this structure is simple but can be very CPU intensive as you need to do the multiply and adds for *every* sample at your signal sample rate.
There is an adjustment we can make to figure 2 that can allow for tighter control over our frequency response without having to use a long time history.
Figure 3. Sample by Sample Filter Processing using a History of the Input and Output
The key difference between figure 2 and figure 3 is that we can also mix in previous filter *outputs* to generate the output signal. Adding this “feedback” can yield some interesting properties and is the root of another class of digital filters called IIR (Infinite Impulse Response filters).
One of the primary advantages of this approach that you need fewer coefficients than an FIR filter structure to get a desired frequency response. There are always trade-offs when using IIR filters vs. FIR filters so be sure to read up on the differences. The example I showed in figure 3 is called a “biquad”. A biquad filter is a common filter building block that can be easily cascaded to construct larger filters. There are several reasons to use a biquad structure, one of which being that there are many design tools that can generate the coefficients for all of the common use cases. Several years ago, I built a tool around a set of design equations that were useful for audio filtering.
At the time I made the tool shown in figure 4, I was using biquad filter structures for tone controls on a guitar effects processor. The frequency and phase response plots where designed to show frequencies of interest of an electric guitar pickup. There are lots of options for coming up with coefficients and numerous libraries to help. For example, you could use Python:
In my guitar effects project, I embedded the filter design equations in my C code so I could recompute coefficients dynamically!
Using the PowerQuad IIR Biquad Engines
The PowerQuad in the LPC55S69 has dedicated hardware to compute IIR biquad filters. Like an FIR filter, the actual code to implement a biquad filter is straightforward. An IIR filter may be simple to code but can use quite a bit of CPU time to crunch through all the multiply and accumulate operations. The PowerQuad is available to free up the CPU from performing the core computational component of the biquad computation. A good starting point for using the PowerQuad IIR biquad engine is to use the MCUXpresso SDK. It is important to note that the SDK will be a starting point. The SDK code is written to cover as many use cases as possible and to demonstrate the different functions of the PowerQuad. It can be helpful to read through the source code and decide which pieces you need to extract for your own application. DSP code often requires some hand tuning and optimization for a particular use case. The PowerQuad is connected via the AHB bus and the Cortex-M33 co-processor interface. Let’s take a look at the SDK source code to see how you the IIR engine works.
Using the “Import SDK Examples” wizard in MCUXpresso, you will find PowerQuad examples under driver_examples > PowerQuad
Figure 5. Selecting the PowerQuad Digital Filter Example
The powerquad_filter project has quite a few examples of the different filter configurations. We are going to focus on a floating point biquad example as a starting point. In the file powerquad_filter.c, there are several test functions that will demonstrate a basic filter setup. I am using LPC55S69 SDK 2.7.1 and there is function around line 455 (Note the spelling mistake PQ_VectorBiqaudFloatExample).
Figure 6. Vectorized Floating Point IIR Filter Function
The 1st important point to note is that PowerQuad computes IIR filters using “Direct Form II”. In the previous figures I showed the filter using “Direct Form I”. When one is 1st introduced to IIR filters, “Direct Form I” is the natural starting point as it is the clearest and most straightforward implementation. It is possible however to re-arrange the flow of multiplies and adds and get the same arithmetic result.
When using "Direct Form II", we do not need to store history of both inputs and outputs. Instead, we store an intermediate computation which is labeled v[n]. During the computation of the filter, the intermediate history v[n] must be saved. We will refer these intermediate values as the filter “state”. To setup the PowerQuad for IIR filter operation, there are handful of registers on the AHB bus where the state and coefficients are stored. In the SDK examples, the state of the filter is initialized with PQ_BiquadRestoreInternalState().
Figure 8. Restoring/Initializing Filter State
Once the PowerQuad IIR engine is initialized, data samples can be processed through the filter. Let us take a look at the function PQ_VectorBiqaudDf2F32() in fsl_powerquad_filter.c
Figure 9. Vectorized IIR Filter Implementation.
This function is designed to process longer blocks of input samples, ideally in multiples 8. Note that many of the SDK examples are designed make it simple to get started but could be easily tuned to remove operations that may be not applicable in your application code. For example, the modulo operation to determine if the input block is a multiple of 8 is something that could be easily removed to save CPU time. In your application, you have complete control over buffer sizes and can easily optimize and remove unnecessary operations. The actual computation of the filter can be observed in the code block that processes the 1st block of samples.
Figure 10. Transfering Data to the IIR Engine with the ARM MCR Coprocessor Instruction
Data is transferred to the PowerQuad with the MCR instruction. This instruction transfers data from an CPU register to an attached co-processor (the PowerQuad in this case). The PowerQuad does the work of crunching through the Direct Form II IIR structure. While it take some CPU intervention to move data into the PowerQuad, the PowerQuad is much more efficient at the multiply and adds for the filter implementation.
To get the result, the MRC instruction is used. MRC moves data from a co-processor to a CPU register.
Figure 11. Retrieving the IIR Filter result with the MRC instruction.
Further down in PQ_VectorBiquadDf2F32(), there is assembly code tuned to inject data in blocks of 8 samples. Looking at PQ_Vector8BiquadDf2F32():
Figure 12. Vectorized Data Insertion into the PowerQuad.
Notice all the MCR/MRC functions to transfer data in and out of the biquad engine. All the other instructions are “standard” ARM instructions to get data into the registers that feed coprocessor. Take some time to run the examples in the SDK. They are structured to inject a known sequence to verify correct filter operation. Now that you have seen some the of the internals, you can use the pieces you need from the SDK to implement your signal processing chain.
The PowerQuad can help accelerate biquad filters. There are 2 separate biquad engines built into the PowerQuad.
The PowerQuad IIR functions are configured through registers on the AHB bus and the actual input/output samples transferred through the Cortex M33 coprocessor interface.
The SDK samples are a good starting point to see how configure and transfer data to the PowerQuad. There are optimization opportunities for your particular application so be sure to inspect all of the code.
If you need more than two biquad filters, you will need to preserve the “state” of the filter. This can be a potentially expensive operation if you are constantly saving/restoring state. In this case you will want to consider processing longer blocks of data.
You may not need to save the entire “state” of the filter. For example, if the filter coefficients are the same for all of the your filters, all you need to save and restore is v[n].
While the PowerQuad can speed up (6x) the core IIR filter processing, you still need the CPU to setup the PowerQuad and feed in samples. Consider using one the extra Cortex M33 cores in the LPC55S69 to do your data shuffling.
You now have a head start on performing time domain filtering with the LPC55S69 PowerQuad. We examined IIR filters, which have lots of applications in audio and sensor signal processing, but the PowerQuad can also accelerate FIR filters. Next time we are going to dive a litter deeper with some frequency domain processing with the PowerQuad transform engine. The embedded transform engine can accelerate processing of Fast Fourier Transforms *significantly*. Stay tuned for more embedded signal processing goodness!