The MCXN947 is the flagship part in the MCX N family of microcontrollers. The superset part includes 2MB of internal flash memory and 512KB of internal SRAM. Additional memory can be added to an MCX N system design using the unique FlexSPI controller. FlexSPI is a unique peripheral that enables access to SPI (Serial Peripheral Interconnect) memories via the internal AHB bus. Though the term SPI often implies a single bit, synchronous serial data link, the concept has been extended to quad and octal data paths.
The MCX FlexSPI Peripheral
A key point about FlexSPI is that enables the SPI connected memories to appear in the system memory map. The details of the SPI transactions are managed by the FlexSPI controller. This feature enables Execute-in-Place (XIP) capability, where the CPU is executing directly from the external memory.
The FlexSPI controller has two ports which can be further subdivided into two separate interfaces allowing a maximum of 4 devices if required. SPI based memories typically are accessed in a burst mode, where a controller will read/write data in 512 bytes blocks. The block access nature of these memories can present challenge for typical MCU uses cases. The MCX N integrates a 64-bit data path, 16KB cache in front of the FlexSPI controller known as CACHE64.
The MCX CACHE64 Module
The CACHE64 provides spatial and temporary locality of FlexSPI data to the system, smoothing the block transactional nature of the external SPI memories. The cache makes SPI based memories suitable for direct execution or data storage without necessarily incurring a large performance penalty.
Many FlexSPI use cases focus on extending flash memory for non-volatile storage of code and data. There are however SPI based PSRAM (Pseudo-Static RAM) devices in the marketplace allowing for volatile RAM to be added as well. PSRAM is an interesting blend of technologies. The core memory technology is dynamic RAM with built in control/refresh logic. The external interface looks like static RAM with Quad or Octal SPI data lines.
For the most timing critical operations, low-latency internal memory is the preferred storage method. However, using external PSRAM on the FlexSPI port can enable a large degree of application flexibility. Possible use cases PSRAM on the MCX include:
The device used for this paper was the 8MB AP Memory APS6408L-3OBM-BA.
The APS6408L PSRAM in a 6mmx8mm BGA Package
The APS6408L-3OBM-BA is packaged in a small 6mmx8mm, 24-ball, 1mm pitch BGA. This footprint is commonly used for Octal SPI memories and there are several vendors who produce pin compatible devices.
Octal SPI memories are DDR capable, supporting data transfers on both edges of a clock. Higher speed octal devices can support up to 400MB/s transfers with a 200MHz serial clock but generally use a lower voltage (+1.8v) interface. +3.3v based Quad/Octal devices are available but generally are limited to 133MHz serial clock rates. For this paper, we will use the APS6408L-3OBM-BA variant as it supports a 3.3v interface with a maximum clock speed of 133MHz equating to a maximum data transfer rate of 266MB/s. Note that the MCX N supports 1.8v IO to support the higher speed devices.
The FRDM-MCXN947 development board comes with a W25Q64 Quad SPI flash memory part installed. This part is packaged in a wide SOIC8 package.
The FRDM-MCXN947 Quad SPI Flash Configuration Default
However, the PCB design has pads for attaching both QuadSPI devices in a wide SOIC8 form factor or Octal SPI devices in the 6mmx8mm BGA24 form factor.
The FRDM-MCXN947 Supports Wide SOIC8 and 6mmx8mm BGA24 Device Packages
I chose the 3.3v variant of AP Memory APS6408L as it was the simplest configuration to use with the FRDM-MCXN947. Removing the W25Q64 is straightforward with use of a hot air rework tool, exposing the BGA24 pads underneath.
Revealing the BGA24 pads on the FRDM-MCXN947
The APS6408L-3OBM-BA can be soldered with a hot air rework tool after adding some solder paste to the exposed pads.
The APS6408L-3OBM-BA soldered to the FRDM-MCXN947
There is currently no sample in the MCUXpresso SDK for using octal PSRAM with the MCXN947. However, the FlexSPI controller is very similar to that in the MIMXRT685. Using a PSRAM sample from the i.MXRT685 SDK, I developed an example for the MCXN947.
https://github.com/wavenumber-eng/mcxn947_octal_psram
Inside of the repository is a project named “bunny_octal_psram_test” which can be imported into MCUXpresso IDE v11.9.0 [Build 2144] [2024-01-05] or later. This sample performs a basic memory test of the entire array as well as implementing some basic PSRAM transfer tests. The “bunny” naming convention comes from its origin in the bringup code for the of the MCXN947 “BunnyBoard” which also uses the APS6408L-3OBM-BA.
The MCXN947 BunnyBoard is an M.2 22mm x 22mm form factor module
The FlexSPI memory controller was designed to be a future proof interface that enables the MCX N to interface with virtually any external SPI based memory. Using a programmable look-up table (LUT) approach, the FlexSPI controller can be adapted to single bit, dual, quad, or octal/DDR memories as needed. The sample includes a simple LUT to configure for the APS6408L-3OBM-BA.
LUT configuration for the APS6408L-3OBM-BA
PSRAM devices typically require additional configuration for parameters such as read latencies and burst sizes. This sample provides some defaults that will work with the APS6408L-3OBM-BA. Control register access is performed with the FlexSPI SDK API and the custom LUT.
Accessing PSRAM internal registers over FlexSPI.
The APS6408L-3OBM-BA is specified for 3.3v operation and a maximum 133MHz clock. PLL1 was used to generate the 133MHz clock for the FlexSPI controller.
Configuring PLL1 to generate a 133MHz clock for the FlexSPI interface
Once the PSRAM is configured, it can be accessed through using normal memory access patterns such as with a pointer.
volatile uint32_t *psram = (volatile uint32_t *)(BUNNY_FLEXSPI_BASE_ADDRESS);
psram[0] = 0xAA551122;
It is possible to configure the build system such that the PSRAM can be used by the linker. Care must be taken to ensure that the PSRAM/FlexSPI is initialized before the standard C initialization and copy down routines. This is beyond the scope of this paper but will be a topic for a future paper.
When using PSRAM with FlexSPI connected memories , it is important to understand how the cache and burst access nature of the PSRAM impacts system performance. Real world scenarios can often have a variety of memory access patterns making it difficult to develop a singular test to characterize performance. However, I typically will run tests that operate the memories at various limits, with the understanding that real world performance will fall somewhere in between.
For this work, there were two limiting cases that were evaluated:
For the block transaction test, the code would read/write block sizes from 1KB to 32KB using both DMA and memcpy. The intent of this test case was to show the limiting behavior of the CPU interacting with the CACHE64 (best case scenario).
To achieve good memcpy performance, I linked this application against the newlib library. The newlib build used with MCUXpresso has a hand tuned, assembly language implementation of memcpy that performs better than redlib or newlib-nano. You can learn more about the newlib implementation via memfault’s excellent analysis.
The block transaction test performed 32-bit reads and writes to a predetermined range of randomized addresses. This test evaluates the limiting case where the transactions would almost always fall outside of the cache. The randomized access test cases included read-only, write-only and a write-then-read transactions.
The CPU cycle counts for the core code paths performing the transfers were gauged using the ARM SysTick timer. Tests were iterated 256 times and transfer rates were computed using an average cycle count value over the iterations. The clock cycle counting method was calibrated and the results are within a reasonable margin of error (+/-2 cycles). The transfer rates were computed using the CPU cycle count and the system clock rate.
Before measuring timings on the PSRAM, a control test was executed using a block of internal memory as a reference. Since the code paths for operating the DMA transfer, performing memcpy and implementing the random read/writes have their own performance characteristics, the control test establishing a baseline for comparison.
As stated previously, the memcpy implementation uses a hand tuned assembly from the newlib library. The DMA transfer code has minimal overhead as the timed code path initiating a DMA transfer using SDK API and polling for the result. The random read/write code is a straightforward C for loop with double indexed array access. The test code was compiled with the -02 optimization flag. The codes used for the memory transfers doesn’t represent any particular optimization or use case but are typical of what might be found in a real-world application.
The results shown are copies from a serial debug terminal. Data was recorded, formatted, and printed by the MCXN947 PSRAM test firmware.
The memory control test using internal SRAM transfers
This test represents a control case as the source and destination buffers are both in internal SRAM and both of the buffers are in RAM banks on different AHB ports. There are many interesting features in the control data, but for now will consider this a baseline for how the test algorithms perform when using PSRAM.
Test Run #1 : 133MHz FlexSPI Octal PSRAM – No Cache
There were a few notable features in from test run #1 which has the CACHE64 disabled.
Test Run #2 : 150MHz (overclocked) FlexSPI Octal PSRAM – No Cache
Test run #2 was identical to #1 except that the FlexSPI clock rate was increased to 150MHz. This is overclocking the APS6408L-3OBM-BA PSRAM which is not recommended in a production use case over the published temperature range. As expected , there was a slight increase in performance due to the faster clock.
Test Run #3 : 133MHz FlexSPI Octal PSRAM – Cache Enabled
Test Run #3 returns to the 133MHz clock rate and enables the CACHE64 module.
A few notable features in this dataset:
Test Run #4 : 150MHz (Overclocked) FlexSPI Octal PSRAM – Cache Enabled
Test Run #4 is a repeat of #3 with the FlexSPI running at 150MHz
From this initial data we can observe the behavior of the FlexSPI controller coupled to an Octal PSRAM through the CACHE64 using some limiting test cases. Real world performance will vary, but this dataset and code can provide a starting point to assess suitability for a specific requirement. These test cases show some of the performance boundaries, so it is to be expected that real world performance will fall between these limits.
While it was out of scope of this paper, it is possible to execute code from FlexSPI/PSRAM. There is some precedent available with the LPC5536 microcontroller. It uses the same FlexSPI controller and a smaller 8Kb CACHE64 module.
NXP Application Note AN13591 provides data on XIP performance as compared to code executing from internal Flash:
https://www.nxp.com/docs/en/application-note/AN13591.pdf
Interestingly, code execution performance is nearly identical when comparing CoreMark scores when running from Internal Flash, Octal SPI Flash and Octal SPI HyperRAM/PSRAM for the LPC5536
150MHz LPC5536 w/ MX25UM51345GXDI00 Octal Flash CoreMark Score vs Internal Flash (From AN13591)
150MHz LPC5536 w/ W956D8MBYA5I Octal HyperRAM/PSRAM CoreMark Score vs Internal Flash (From AN13591)
For the most timing critical operations, low-latency internal memory is the preferred storage method. However, using external PSRAM on the FlexSPI interface can enable a large degree of flexibility in potential applications. Adding a large amount of non-volatile memory is simple from the PCB design point of view and does not add significantly to the system BOM.
Using the FRDM-MCXN947 is a simple way to evaluate FlexSPI/PSRAM based design at a low cost. You can get find more information about the FRDM-MCXN947 and the MCX947 microcontroller here with the following links.
Code reference used for tests in this paper
https://github.com/wavenumber-eng/mcxn947_octal_psram.git
Understanding newlib memcpy performance
https://interrupt.memfault.com/blog/memcpy-newlib-nano
FlexSPI CoreMark Performance on LPC553x/LPC55S3x
https://www.nxp.com/docs/en/application-note/AN13591.pdf
FRDM-MCXN947 Product Page
MCXN947 Product Page
AP Memory APS6408L-3OBM-BA PSRAM Datasheet
https://www.apmemory.com/products/psram-iot-ram/