"Crossing Over" with the i.MX RT600 - Part 1 of 2

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

"Crossing Over" with the i.MX RT600 - Part 1 of 2

Eli_H
NXP Pro Support
NXP Pro Support
1 0 3,524

It was October of 2017 when concept of the “Crossover” MCU was first introduced with the NXP i.MX RT1050.     For a microcontroller & DSP enthusiast such as myself, the concept of a high clock rate MCU that could tackle problems previously relegated to application processors was very intriguing.  I generally approach problems from the perspective of “simplicity”.  For many real-world processing challenges, microcontrollers can be the simplest solution.      There are applications however that demand more real-time processing capability on continuous streams of data.   Whether it be high channel count audio, or complicated sensor fusion, there are situations where you need more than what a traditional microcontroller can offer.   In some cases, one may choose to go down the path of an applications processor, but this approach is not always optimal when low latency, real-time response is a critical requirement.    

The typical method to approach this problem was to use a “Digital Signal Processor” (DSP).  DSPs generally have an internal processing pipeline highly tuned to real-time sample by sample data processing.     DSPs are great solutions, but you often must give up many of the general-purpose features found in a microcontroller.      I encounter this scenario quite often and it is not uncommon to pair a general-purpose microcontroller with a dedicated DSP in an application architecture to get the best of both worlds.   My first experience with a DSP architecture was with the Motorola 56K series.   It was a powerful tool for crunching numbers, but I found that most of my design with a DSP also required a general-purpose microcontroller for the traditional IO and connectivity.

This is where the i.MX  RT600 steps in.     The RT600 crossover MCU is a newer member of the i.MX crossover family focused on real-time number crunching applications such as audio, sensor fusion and machine learning.     

 

Eli_H_0-1620577866509.png

Figure 1.  The i.MX RT600.

The NXP RT family addresses the audio and sensor fusion challenge by integrating a powerful 300MHz general purpose microcontroller with powerful 600MHz DSP processor, a large 4.5MB SRAM bank and a plethora traditional peripherals & IO.

The RT600 General Purpose CPU Platform

The CPU platform in the RT600 is based upon the Arm® Cortex®-M33 core (CM33).    By itself, the CM33 is capable of sophisticated audio applications. The CM33 is built on the Armv8-M architecture which includes Single Instruction on Multiple Data (SIMD) as well as Multiply And Accumulate (MAC) instructions.    There is quite a bit that could be accomplished with CM33 running at 300MHz before evening considering the additional DSP core.    Last year I wrote quite a bit about the LPC5500 series MCUs focusing on the LPC55S69 and its ample processing capabilities.     The LPC55S69 is also based upon the CM33 core (running at 150MHz vs 300MHz).       I think of the RT600 as a serious upgrade to the LPC55 when you need more of everything.

Eli_H_1-1620577985206.png

 Figure 2.  The RT600 General Purpose CPU Platform.

One feature of the CM33 is a co-processor interface which can be accessed with special assembly language instructions. There are two “co-processors” attached to the CM33 in the RT600:  The PowerQuad (labeled as the DSP Accelerator) and the CASPER (labeled as the Crypto Engine).

The PowerQuad Co-Processor

The PowerQuad is a dedicated hardware unit that runs in parallel to the CM33 core inside the RT600.  By using the PowerQuad to work in parallel to the CM33, it is possible to implement sophisticated signal processing algorithms while leaving your general purpose CM33 core available to do other tasks such as communication and IO.

Eli_H_2-1620578085202.png

Figure 3.  The RT600 PowerQuad Co-Processor.

For the LPC55S69 release, I wrote four articles on the PowerQuad and detailed some of its use cases:

The key takeaway here is that before we have even considered the additional DSP core, the RT600 offers a powerful hardware co-processor that can do very useful operations “out of the box”.     We can be crunching Fast Fourier Transforms (FFTs) at a high rate before even tapping into any of the resources of the DSP or CM33 cores!

CASPER

CASPER is an accelerator attached to the CM33 coprocessor interface that is optimized cryptographic computations. At its core, CASPER is a dual multiply-accumulate-shift engine that can operate on large blocks of data.  Applications of CASPER include accelerating cryptographic functions such as public key verification (i.e. TLS/SSL) and computing HMAC signatures.  Once again, before we have even considered using the Tensilica© HiFi4 DSP in the RT600, there is another accelerator in the RT600 that can offload complicated operations.        Many connected products require multiple cryptographic operations and CASPER is a great way of implementing the functions without taxing your other processing pipelines.     There are plenty of examples included in the MCUXpresso SDK for utilizing the CASPER accelerator.   This feature makes the RT600 well suited to IOT applications.

Eli_H_3-1620578412092.png

Figure 4.  The RT600 CASPER Cryptographic Accelerator.

Cadence Tensilica® HiFi 4 DSP

What places the RT600 into a class of its own is the inclusion of a Cadence Tensilica® HiFi4 DSP core.    I previously mentioned that it is certainly possible to implement DSP in the general purpose CM33. There are situations however where a dedicated DSP processing pipeline is needed to achieve a required throughput.      One of the limitations of the Cortex™-M core is that several cycles can be used just initializing registers before utilizing the SIMD/DSP instructions.   In many cases, the SIMD/MAC instructions can execute in a single cycle, but several CPU cycles are required general purpose registers loaded with input data.    Dedicated DSP processors are optimized to allow for continuous processing single cycle MAC operations using features such  as circular indexed memory modes and zero overhead loops. The HiFi4 DSP supports four 32x32-bit MACs and the ability to issue two 64-bit loads per cycle. There is a vector floating point unit providing up to four single-precision IEEE floating point MACs per cycle. All HiFi4 operations can be used as intrinsics in standard C code.

Eli_H_0-1620578727496.png

 Figure 5.  Cadence Tensilica® HiFi 4 DSP in the RT600.

The HiFi4 Audio DSP was designed specifically for audio and sensor fusion processing pipelines.   It is supported with a large 3rd party ecosystem that covers applications such as sensor fusion, real-time audio, noise reduction, sound enhancement and voice processing.       There are more than 300 DSP software packages already ported and optimized for the HiFi4 DSP architecture. This means you can get up and running very quickly, and can easily port your own proprietary software, completely in C, while also maintaining or surpassing the performance of assembly on other DSPs.  Cadence even offers an optimized version of  TensorFlow Lite to enable machine learning and AI applications at the edge.

RT600 Memory Architecture

 Another unique aspect of the RT600 is its large memory availability and architecture.   An important component of an audio processing architecture is the availability of large blocks of memory for time/sample history buffers and fast memories for critical code execution.

Eli_H_1-1620578777119.png

Eli_H_2-1620578777137.png

 Figure 6.  RT600 Memory Architecture.

The availability of 4.5MB of internal SRAM immediately should catch your attention!     I tend to think in terms of real-time audio applications for musical performance.   4.5MB of RAM allows for deep buffers to implement delay-based effects include loopers and large time constant reverbs.      A 4.5MB pool of fast SRAM removes the need for slower external SDRAM (which is common in many audio DSP architectures).      The 4.5MB is shared between the CM33 and the HiFi4 DSP.    The memory is sufficiently partitioned to allow for a large amount of flexibility in the processing architecture.    There are 30 partitions across 9 AHB ports.  This means the processing system can be designed to minimize contention between the CPUs and RAM allowing for maximum throughput.  

The HiFi4 DSP has dedicated local Tightly Coupled Memories (TCMs) for data and code.  Each TCM is 64 KB accessed by a 128-bit port.  The code and data TCMs can be accessed by the Cortex-M33 and by the DMA controllers through a slave port on the AHB matrix.  These connections allow the CM33 to bootstrap the HiFi4 with executable code.   In addition to the TCMs, there is a has a dedicated 4-way data cache of 64 KB with 256 bytes per line and a dedicated 4-way instruction cache of 32 KB with 256 bytes per line.      The local HiFi4 memory architecture enables the highest level of processing capability as the DSP engine can access code and data with minimal bottle neck.

Eli_H_3-1620578777212.png

Figure 7.  HiFi4 Local Memory Architecture.

Like many of the other i.MX RT crossover parts, the RT600 is a flash-less component.  This allows the RT600 to be efficiently built on 28nm FD-SOI semiconductor process technology taking advantage of power consumption savings and clock frequency improvements.  Code can be stored in low-cost external Quad/Octo SPI NOR Flash memory.   Non time-critical routines can execute in place from external memory while code requiring better performance can execute from internal SRAM.   This approach gives architecture maximum flexibility in balancing cost and performance.    It was almost 10 years ago when the NXP LPC4357 was introduced with a QSPI XIP flash interface.   The XIP external flash approach has proven to be an effective way to lower the total cost of a solution which providing large amounts of flash for applications also well as offering flexibility when architecting the MCU solution.

The Next Steps

In part 2 of this article series, we will dive in to some more of the interesting peripherals in the RT600 that uniquely position it as a high-performance audio crossover MCU.  I hope this article sparked your interest in the product and would ask you to find you way RT600 website to learn more. 

Cheers!

 

Links:

Crossing Over With the i.MX RT600 Part 1 of 2

Crossing Over With the i.MX RT600 Part 2 of 2

i.MX RT685 Hardware Design 1 of 3 - Power and Package

i.MX RT685 Hardware Design 2 of 3 - Flash Memory and Boot Configuration

i.MX RT685 Hardware Design 3 of 3 - The SuperMonkey Design

i.MX RT685 SuperMonkey QSPI Bringup with MCUXpresso and Segger J-Link

Creating a Custom Zephyr Board for the i.MX RT685 SuperMonkey