i.MX RT Crossover MCUs Knowledge Base

Discussions

RT10xx SAI basic and SDCard wave file play 1. Introduction NXP RT10xx's audio modules are SAI, SPDIF, and MQS. The SAI module is a synchronous serial interface for audio data transmission. SPDIF is a stereo transceiver that can receive and send digital audio, MQS is used to convert I2S audio data from SAI3 to PWM, and then can drive external speakers, but in practical usage, it still need to add the amplifier drive circuit. When we use the SAI module, it will be related to the audio file play and the data obtained. This article will be based on the MIMXRT1060-EVK board, give the RT10xx SAI module basic knowledge, PCM waveform format, the audio file cut, and conversion tool, use the MCUXpresso IDE CFG peripheral tool to create the SAI project, play the audio data, it will also provide the SDcard with fatfs system to read the wave file and play it. 2. Basic Knowledge and the tools Before entering the project details and testing, just provide some SAI module knowledge, wave file format information, audio convert tools. 2.1 SAI module basic RT10xx SAI module can support I2S, AC97, TDM, and codec/DSP interface. SAI module contains Transmitter and Receiver, the related signals: SAI_MCLK: master clock, used to generate the bit clock, master output, slave input. SAI_TX_BCLK: Transmit bit clock, master output, slave input SAI_TX_SYNC: Transmit Frame sync, master output, slave input, L/R channel select SAI_TX_DATA[4]:Transmit data line, 1-3 share with RX_DATA[1-3] SAI_RX_BCLK: receiver bit clock SAI_RX_SYNC: receiver frame sync SAI_RX_DATA[4]: receiver data line SAI module clocks: audio master clock, bus clock, bit clock SAI module Frame sync has 3 modes: 1）Transmit and receive using its own BCLK and SYNC 2）Transmit async, receive sync: use transmit BCLK and SYNC, transmit enable at first, disable at last. 3）Transmit sync, receive async: use receive BCLK and SYNC, receiver enable at first, disable at last. Valid frame sync is also ignored (slave mode) or not generated (master mode) for the first four-bit clock cycles after enabling the transmitter or receiver. Pic 1 SAI module clock structure: Pic 2 SAI module 3 clock sources: PLL3_PFD3, PLL5, PLL4 In the above picture, SAI1_CLK_ROOT, which can be used as the MCLK, the BCLK is: BCLK= master clock/(TCR2[DIV]+1)*2 Sample rate = Bitclockfreq /（bitwidth*channel） 2.2 waveform audio file format WAVE file is used to save the PCM encode data, WAVE is using the RIFF format, the smallest unit in the RIFF file is the CK struct, CKID is the data type, the value can be: “RIFF”,“LIST”,“fmt”, “data” etc. RIFF file is little-endian. RIFF structure： typedef unsigned long DWORD;//4B typedef unsigned char BYTE;//1B typedef DWORD FOURCC; // 4B typedef struct { FOURCC ckID; //4B DWORD ckSize; //4B union { FOURCC fccType; // RIFF form type 4B BYTE ckData[ckSize]; //ckSize*1B } ckData; } RIFFCK; Pic 3 Take a 16Khz 2 channel wave file as the example: Pic 4 Yellow: CKID Green: data length Purple: data The detailed analysis as follows: Pic 5 We can find, the real audio data, except the wave header, the data size is 1279860bytes. 2.3 Audio file convert In practical usage, the audio file may not the required channel and the sample rate configuration, or the format is not the wave, or the time is too long, then we can use some tool to convert it to your desired format. We can use the ffmpeg tool: https://ffmpeg.org/ About the details, check the ffmpeg document, normally we use these command: mp3 file converts to 16k, 16bit, 2 channel wave file: ffmpeg -i test.mp3 -acodec pcm_s16le -ar 16000 -ac 2 test.wav or： ffmpeg -i test.mp3 -aq 16 -ar 16000 -ac 2 test.wav test.wav, cut 35s from 00:00:00, and can convert save to test1.wav: ffmpeg -ss 00:00:00 -i test.wav -t 35.0 -c copy test1.wav Pic 6 Pic 7 2.4 Obtain wave L/R channel audio data Just like the SDK code, save the L/R audio data directly in the RT RAM array, so here, we need to obtain the audio data from the wav file. We can use the python readout the wav header, then get the audio data size, and save the audio data to one array in the .h files. The related Python code can be: import sys import wave def wav2hex(strWav, strHex): with wave.open(strWav, "rb") as fWav: wavChannels = fWav.getnchannels() wavSampleWidth = fWav.getsampwidth() wavFrameRate = fWav.getframerate() wavFrameNum = fWav.getnframes() wavFrames = fWav.readframes(wavFrameNum) wavDuration = wavFrameNum / wavFrameRate wafFramebytes = wavFrameNum * wavChannels * wavSampleWidth print("Channels: {}".format(wavChannels)) print("Sample width: {}bits".format(wavSampleWidth * 8)) print("Sample rate: {}kHz".format(wavFrameRate/1000)) print("Frames number: {}".format(wavFrameNum)) print("Duration: {}s".format(wavDuration)) print("Frames bytes: {}".format(wafFramebytes)) fWav.close() pass with open(strHex, "w") as fHex: # Print WAV parameters fHex.write("/*\n"); fHex.write(" Channels: {}\n".format(wavChannels)) fHex.write(" Sample width: {}bits\n".format(wavSampleWidth * 8)) fHex.write(" Sample rate: {}kHz\n".format(wavFrameRate/1000)) fHex.write(" Frames number: {}\n".format(wavFrameNum)) fHex.write(" Duration: {}s\n".format(wavDuration)) fHex.write(" Frames bytes: {}\n".format(wafFramebytes)) fHex.write("*/\n\n") # Print WAV frames fHex.write("uint8_t music[] = {\n") print("Transferring...") i = 0 while wafFramebytes > 0: if(wafFramebytes < 16): BytesToPrint = wafFramebytes else: BytesToPrint = 16 fHex.write(" ") for j in range(0, BytesToPrint): if j != 0: fHex.write(' ') fHex.write("0x{:0>2x},".format(wavFrames[i])) i+=1 j+=1 fHex.write("\n") wafFramebytes -= BytesToPrint fHex.write("};\n") fHex.close() print("Done!") wav2hex(sys.argv[1], sys.argv[2]) Take the music1.wave as an example: Pic 8 2.4 Audio data relationship with audio wave 16bit data range is: -32768 to 32767, the goldwave related value range is(-1~1).Use goldwave tool to open the example music1.wav, check the data in 1s position, the left channel relative data is -0.08227, right channel relative data is -0.2257. Pic 9 pic 10 Now, calculate the L/R real data, and find the position in the music1.h. Pic 11 From pic 8, we can know, the real wave R/L data from line 11, each line contains 16 bytes of data. So, from music1.wav related value, we can calculate the related data, and compare it with the real data in the array, we can find, it is totally the same. 3. SAI MCUXpresso project creation Based on SDK_2.9.2_EVK-MIMXRT1060, create one SAI DMA audio play project. The audio data can use the above music1.h. Create one bare-metal project: Drivers check： clock, common, dmamux, edma,gpio,i2c,iomuxc,lpuart,sai,sai_edma,xip_device Utilities check： Debug_console,lpuart_adapter,serial_manager,serial_manager_uart Board components check： Xip_board Abstraction Layer check： Codec, codec_wm8960_adapter,lpi2c_adapter Software Components check： Codec_i2c,lists,wm8960 After the creation of the project, open the clocks, configure the clock, core, flexSPI can use the default one, we mainly configure the SAI1 related clocks: Pic 12 Select the SAI1 clock source as PLL4, PLL4_MAIN_CLK configure as 786.48MHz. SAI1 clock configure as 6.144375MHz. After the configuration, update the code. Open Pins tool, configure the SAI1 related pins, as the codec also need the I2C, so it contains the I2C pin configuration. Pic 13 Update the code. Open peripherals, configure DMA, SAI, NVIC. Pic 14 Pic 15 DMA配置如下： pic16 After configuration, generate the code. In the above configuration, we have finished the SAI DMA transfer configuration, SAI master mode, 16bits, the sample rate is 16kHz, 2channel, DMA transfer, bit clock is 512Khz, the master clock is 6.1443Mhz. void callback(I2S_Type *base, sai_edma_handle_t *handle, status_t status, void *userData) { if (kStatus_SAI_RxError == status) { } else { finishIndex++; emptyBlock++; /* Judge whether the music array is completely transfered. */ if (MUSIC_LEN / BUFFER_SIZE == finishIndex) { isFinished = true; finishIndex = 0; emptyBlock = BUFFER_NUM; tx_index = 0; cpy_index = 0; } } } int main(void) { sai_transfer_t xfer; /* Init board hardware. */ BOARD_ConfigMPU(); BOARD_InitBootPins(); BOARD_InitBootClocks(); BOARD_InitBootPeripherals(); #ifndef BOARD_INIT_DEBUG_CONSOLE_PERIPHERAL /* Init FSL debug console. */ BOARD_InitDebugConsole(); #endif PRINTF(" SAI wav module test!\n\r"); /* Use default setting to init codec */ if (CODEC_Init(&codecHandle, &boardCodecConfig) != kStatus_Success) { assert(false); } /* delay for codec output stable */ DelayMS(DEMO_CODEC_INIT_DELAY_MS); CODEC_SetVolume(&codecHandle,2U,50); // set 50% volume EnableIRQ(DEMO_SAI_IRQ); SAI_TxEnableInterrupts(DEMO_SAI, kSAI_FIFOErrorInterruptEnable); PRINTF(" MUSIC PLAY Start!\n\r"); while (1) { PRINTF(" MUSIC PLAY Again\n\r"); isFinished = false; while (!isFinished) { if ((emptyBlock > 0U) && (cpy_index < MUSIC_LEN / BUFFER_SIZE)) { /* Fill in the buffers. */ memcpy((uint8_t *)&buffer[BUFFER_SIZE * (cpy_index % BUFFER_NUM)], (uint8_t *)&music[cpy_index * BUFFER_SIZE], sizeof(uint8_t) * BUFFER_SIZE); emptyBlock--; cpy_index++; } if (emptyBlock < BUFFER_NUM) { /* xfer structure */ xfer.data = (uint8_t *)&buffer[BUFFER_SIZE * (tx_index % BUFFER_NUM)]; xfer.dataSize = BUFFER_SIZE; /* Wait for available queue. */ if (kStatus_Success == SAI_TransferSendEDMA(DEMO_SAI, &SAI1_SAI_Tx_eDMA_Handle, &xfer)) { tx_index++; } } } } } 4. SAI test result To check the real L/R data sendout situation, we modify the music array first 16 bytes data as: 0x55,0xaa,0x01,0x00,0x02,0x00,0x03,0x00,0x04,0x00,0x05,0x00,0x06,0x00,0x07,0x00 Then test SAI_MCLK,SAI_TX_BCLK,SAI_TX_SYNC,SAI_TXD pin wave, and compare with the defined data, because the polarity is configured as active low, it is falling edge output, sample at rising edge. The test point on the MIMXRT1060-EVK board is using the codec pin position: Pic 17 4.1 Logic Analyzer tool wave Pic 18 MCLK clock frequency is 6.144375Mhz, BCLK is 512KHz, SYNC is 16KHz. Pic 19 The first frame data is:1010101001010101 0000000000000001 0XAA55 0X0001 It is the same as the array defined L/R data. SYNC low is Left 16 bit, High is right 16 bit. 4.2 Oscilloscope test wave Just like the logic analyzer, the oscilloscope wave is the same： Pic 20 Add the music.h to the project, and let the main code play the music array data in loop, we will hear the music clear when insert the headphone to on board J12 or add a speaker. 5. SAI SDcard wave music play This part will add the sd card, fatfs system, to read out the 16bit 16K 2ch wave file in the sd card, and play it in loop. 5.1 driver add Code is based on SDK_2.9.2_EVK-MIMXRT1060, just on the previous project, add the sdcard, sd fatfs driver, now the bare-metal driver situation is: Drivers check： cache, clock, common, dmamux, edma,gpio,i2c,iomuxc,lpuart,sai,sai_edma,sdhc, xip_device Utilities check： Debug_console,lpuart_adapter,serial_manager,serial_manager_uart Middleware check： File System->FAT File System->fatfs+sd, Memories Board components check： Xip_board Abstraction Layer check： Codec, codec_wm8960_adapter,lpi2c_adapter Software Components check： Codec_i2c,lists,wm8960 5.2 WAVE header analyzer with code From previous content, we can know the wav header structure, we need to play the wave file from the sd card, then we need to analyze the wave header to get the audio format, audio data-related information. The header analysis code is: uint8_t Fun_Wave_Header_Analyzer(void) { char * datap; uint8_t ErrFlag = 0; datap = strstr((char*)Wav_HDBuffer,"RIFF"); if(datap != NULL) { wav_header.chunk_size = ((uint32_t)*(Wav_HDBuffer+4)) + (((uint32_t)*(Wav_HDBuffer + 5)) << + (((uint32_t)*(Wav_HDBuffer + 6)) << 16) +(((uint32_t)*(Wav_HDBuffer + 7)) << 24); movecnt += 8; } else { ErrFlag = 1; return ErrFlag; } datap = strstr((char*)(Wav_HDBuffer+movecnt),"WAVEfmt"); if(datap != NULL) { movecnt += 8; wav_header.fmtchunk_size = ((uint32_t)*(Wav_HDBuffer+movecnt+0)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 1)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 2)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 3)) << 24); wav_header.audio_format = ((uint16_t)*(Wav_HDBuffer+movecnt+4) + (uint16_t)*(Wav_HDBuffer+movecnt+5)); wav_header.num_channels = ((uint16_t)*(Wav_HDBuffer+movecnt+6) + (uint16_t)*(Wav_HDBuffer+movecnt+7)); wav_header.sample_rate = ((uint32_t)*(Wav_HDBuffer+movecnt+8)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 9)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 10)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 11)) << 24); wav_header.byte_rate = ((uint32_t)*(Wav_HDBuffer+movecnt+12)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 13)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 14)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 15)) << 24); wav_header.block_align = ((uint16_t)*(Wav_HDBuffer+movecnt+16) + (uint16_t)*(Wav_HDBuffer+movecnt+17)); wav_header.bps = ((uint16_t)*(Wav_HDBuffer+movecnt+18) + (uint16_t)*(Wav_HDBuffer+movecnt+19)); movecnt +=(4+wav_header.fmtchunk_size); } else { ErrFlag = 1; return ErrFlag; } datap = strstr((char*)(Wav_HDBuffer+movecnt),"LIST"); if(datap != NULL) { movecnt += 4; wav_header.list_size = ((uint32_t)*(Wav_HDBuffer+movecnt+0)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 1)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 2)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 3)) << 24); movecnt +=(4+wav_header.list_size); } //LIST not Must datap = strstr((char*)(Wav_HDBuffer+movecnt),"data"); if(datap != NULL) { movecnt += 4; wav_header.datachunk_size = ((uint32_t)*(Wav_HDBuffer+movecnt+0)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 1)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 2)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 3)) << 24); movecnt += 4; ErrFlag = 0; } else { ErrFlag = 1; return ErrFlag; } PRINTF("Wave audio format is %d\r\n",wav_header.audio_format); PRINTF("Wave audio channel number is %d\r\n",wav_header.num_channels); PRINTF("Wave audio sample rate is %d\r\n",wav_header.sample_rate); PRINTF("Wave audio byte rate is %d\r\n",wav_header.byte_rate); PRINTF("Wave audio block align is %d\r\n",wav_header.block_align); PRINTF("Wave audio bit per sample is %d\r\n",wav_header.bps); PRINTF("Wave audio data size is %d\r\n",wav_header.datachunk_size); return ErrFlag; } Mainly divide RIFF to 4 parts: “RIFF”,“fmt”,“LIST”,“data”. The 4 bytes data follows the “data” is the whole audio data size, it can be used to the fatfs to read the audio data. The above code also recodes the data position, then when using the fatfs read the wave, we can jump to the data area directly. 5.3 SD card wave data play Define the array audioBuff[4* 512], used to read out the sd card wave file, and use these data send to the SAI EDMA and transfer it to the I2S interface until all the data is transmitted to the I2S interface. Callback record each 512 bytes data send out finished, and judge the transmit data size is reached the whole wave audio data size. 5.4 sd card wave play result Prepare one wave file, 16bit 16k sample rate, 2 channel file, named as music.wav, put in the sd card which already does the fat32 format, insert it to the MIMXRT1060-EVK J39, run the code, will get the printf information: Please insert a card into the board. Card inserted. Make file system......The time may be long if the card capacity is big. SAI wav module test! MUSIC PLAY Start! Wave audio format is 1 Wave audio channel number is 2 Wave audio sample rate is 16000 Wave audio byte rate is 64000 Wave audio block align is 4 Wave audio bit per sample is 16 Wave audio data size is 2728440 Playback is begin! Playback is finished! At the same time, after inserting the headphone or the speaker into the J12, we can hear the music. Attachment is the mcuxpresso10.3.0 and the wave samples.

There is an issue with the DCD file used in the SDK 2.9.0 release for the i.MX RT1170 processor. When the included DCD file is used in a project to configure the SDRAM memory on the EVK, the refresh for the memory is not enabled. This can lead to corruption/data loss over time. To fix the problem, replace the dcd.c file in your project with the attached file instead. We are working on a fix, and a new revision of the SDK will be released soon.

Symptoms Some of us may have experienced the issue that when we put the heap to DTCM, everything is OK. That’s the default settings for MCUXpresso SDK demos. But when we put the heap on cached memory like OCRAM or SDRAM, much of the middleware does not function correctly. This issue happens on USB stack, LwIP and SDcard. USB enumeration failed, ethernet drop packets, the application no longer writes to SD card, system hanging indefinitely on uninitialized semaphores… Diagnosis To understanding this issue, we need to understand the i.MXRT L1 Cache. AN12042 describes the technology of the i.MXRT cache system. The i.MXRT series implement a CPU core platform described in Figure1. The L1 I/D-Cache is embedded in the core platform. The data cache is 4-way set-associate and instruction cache is 2-way set-associate with cache line size of 32 bytes. It connects with the SIM_M7 bus fabric master port by AXI bus. The subsystem of internal/external memory like OCRAM(FlexRAM banks configured as OCRAM), FlexSPI (Serial NOR, NAND Flash and Hyper Flash/RAM etc) and SEMC(SDRAM, PNOR Flash, NAND Flash etc.) are connected to the bus fabric slave port. CPU core access the subsystem through this bus fabric by L1 cache. The ITCM/DTCM is accessed directly by CPU core, bypass the L1 cache. OCRAM and SDRAM is cacheable by default. The cache brings a great performance boost, but the user must pay attention to the cache maintenance for data coherency. To avoid data coherency issue, the easiest way is to use non-cacheable buffers. DTCM/ITCM is Tightly-Coupled Memories, core can access it directly (cache is not involved). That can explain why all SDK demos work correctly by default. Solution Put critical code and data into TCM, it is non-cacheable, which is the fastest way for CPU to access the code and data. But forcing all global data into 128KB DTCM is constraining in many cases. Users can split a non-cache memory region from OCRAM or SDRAM, and put the buffers into this region by the linker of toolchain. Next I will take evkmimxrt1060_host_msd_command_freertos demo for example to illustrate how to make USB HOST stack to run on OCRAM. MCUxpresso IDE 11.2.1 is used for this demo. 1 Buffer definition In USB stack, some important data structures are defined with macros USB_GLOBAL, USB_DMA_DATA_NONINIT_SUB, USB_DMA_DATA_INIT_SUB and USB_CONTROLLER_DATA; These structures are defined in the usb stack by default. We can see these structures in usb_device_ehci.c and usb_host_ehci.c (take usb host as an example). In usb_device_ehci.c /* Apply for QH buffer, 2048-byte alignment */ USB_RAM_ADDRESS_ALIGNMENT(2048) USB_CONTROLLER_DATA static uint8_t qh_buffer[(USB_DEVICE_CONFIG_EHCI - 1) * 2048 + 2 * USB_DEVICE_CONFIG_ENDPOINTS * 2 * sizeof(usb_device_ehci_qh_struct_t)]; /* Apply for DTD buffer, 32-byte alignment */ USB_RAM_ADDRESS_ALIGNMENT(32) USB_CONTROLLER_DATA static usb_device_ehci_dtd_struct_t s_UsbDeviceEhciDtd[USB_DEVICE_CONFIG_EHCI] [USB_DEVICE_CONFIG_EHCI_MAX_DTD]; In usb_host_ehci.c /* EHCI controller driver instances. */ #if (USB_HOST_CONFIG_EHCI == 1U) USB_RAM_ADDRESS_ALIGNMENT(4096) USB_CONTROLLER_DATA static uint8_t s_UsbHostEhciFrameList1[USB_HOST_CONFIG_EHCI_FRAME_LIST_SIZE * 4]; static uint8_t usbHostEhciFramListStatus[1] = {0}; USB_RAM_ADDRESS_ALIGNMENT(64) USB_CONTROLLER_DATA static usb_host_ehci_data_t s_UsbHostEhciData1; #elif (USB_HOST_CONFIG_EHCI == 2U) USB_RAM_ADDRESS_ALIGNMENT(4096) USB_CONTROLLER_DATA static uint8_t s_UsbHostEhciFrameList1[USB_HOST_CONFIG_EHCI_FRAME_LIST_SIZE * 4]; USB_RAM_ADDRESS_ALIGNMENT(4096) USB_CONTROLLER_DATA static uint8_t s_UsbHostEhciFrameList2[USB_HOST_CONFIG_EHCI_FRAME_LIST_SIZE * 4]; static uint8_t usbHostEhciFramListStatus[2] = {0, 0}; USB_RAM_ADDRESS_ALIGNMENT(64) USB_CONTROLLER_DATA static usb_host_ehci_data_t s_UsbHostEhciData1; USB_RAM_ADDRESS_ALIGNMENT(64) USB_CONTROLLER_DATA static usb_host_ehci_data_t s_UsbHostEhciData2; #else #error "Please increase the instance count." #endif 2 Linker file : partition a RAM block from OCRAM for non-cacheable buffers Using managed linker script to configure memory RAM2 as a non-cacheable area. 3 MPU configuratins ( board.c ) MPU divides the memory map into a few regions, and defines the memory attributes of each region. In this step, we need to configure the SRAM_OC_NCACHE_128(RAM2) as non-cacheable /* Region 13 setting: Memory with non-cacheable */ MPU->RBAR = ARM_MPU_RBAR(13, 0x202a0000); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 1, 0, 0, 0, 0, ARM_MPU_REGION_SIZE_128KB); Now, SRAM_OC_NCACHE_128 (RAM2) is a non-cacheable section. Variables in *(NonCacheable.init) and *( NonCacheable) will be put to SRAM_OC_NCACHE_128. 4 Put USB variables into SRAM_OC_NCACHE_128(RAM2) This is done by the following macros. #define USB_LINK_NONCACHE_NONINIT_DATA _Pragma("location = \"NonCacheable\"") Relative source code is in file usb_misc.h #if (defined(DATA_SECTION_IS_CACHEABLE) && (DATA_SECTION_IS_CACHEABLE)) #define USB_GLOBAL USB_LINK_NONCACHE_NONINIT_DATA #define USB_BDT USB_LINK_NONCACHE_NONINIT_DATA #define USB_DMA_DATA_NONINIT_SUB USB_LINK_NONCACHE_NONINIT_DATA #define USB_DMA_DATA_INIT_SUB USB_LINK_DMA_INIT_DATA(NonCacheable.init) #define USB_CONTROLLER_DATA USB_LINK_NONCACHE_NONINIT_DATA #else #define USB_GLOBAL USB_LINK_USB_GLOBAL_BSS #define USB_BDT USB_LINK_USB_BDT_BSS #define USB_DMA_DATA_NONINIT_SUB #define USB_DMA_DATA_INIT_SUB #define USB_CONTROLLER_DATA #endif Please put macro “DATA_SECTION_IS_CACHEABLE=1” in the preprocessor define. 5 build and run project evkmimxrt1060_host_msd_command_freertos, success! Reference: Using the i.MXRT L1 Cache https://www.nxp.com.cn/docs/en/application-note/AN12042.pdf

In the tutorial, I'd like to show the steps of deploying an image classification model on i.MX RT1060 to enabling you to classify fashion images and categories. In the first part of this tutorial, we will review the Fashion MNIST dataset, including how to download it to your system. From there we’ll define a simple CNN network using the TensorFlow platform. Next, we’ll train our CNN model on the Fashion MNIST dataset, train it, and review the results. Finally, we'll optimize the model, after that, the model will be smaller and increase inferencing speed, which is valuable for source-limited devices such as MCU. Let’s go ahead and get started! Fashion MNIST dataset The Fashion MNIST dataset was created by the e-commerce company, Zalando. Fig 1 Fashion MNIST dataset As they note on their official GitHub repo for the Fashion MNIST dataset, there are a few problems with the standard MNIST digit recognition dataset: It’s far too easy for standard machine learning algorithms to obtain 97%+ accuracy. It’s even easier for deep learning models to achieve 99%+ accuracy. The dataset is overused. MNIST cannot represent modern computer vision tasks. Zalando, therefore, created the Fashion MNIST dataset as a drop-in replacement for MNIST. 60,000 training examples 10,000 testing examples 10 classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot 28×28 grayscale images The code below loads the Fashion-MNIST dataset using the TensorFlow and creates a plot of the first 25 images in the training dataset. import tensorflow as tf import numpy as np # For easy reset of notebook state. tf.keras.backend.clear_session() # load dataset fashion_mnist = tf.keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() lass_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] plt.figure(figsize=(8,8)) for i in range(25): plt.subplot(5,5,i+1,) plt.tight_layout() plt.imshow(train_images[i]) plt.xlabel(lass_names[train_labels[i]]) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.show() Fig 2 Running the code loads the Fashion-MNIST train and test dataset and prints their shape. Fig 3 We can see that there are 60,000 examples in the training dataset and 10,000 in the test dataset and that images are indeed square with 28×28 pixels. Creating model We need to define a neural network model for the image classify purpose, and the model should have two main parts: the feature extraction and the classifier that makes a prediction. Defining a simple Convolutional Neural Network (CNN) For the convolutional front-end, we build 3 layers of convolution layer with a small filter size (3,3) and a modest number of filters followed by a max-pooling layer. The last filter map is flattened to provide features to the classifier. As we know, it's a multi-class classification task, so we will require an output layer with 10 nodes in order to predict the probability distribution of an image belonging to each of the 10 classes. In this case, we will require the use of a softmax activation function. And between the feature extractor and the output layer, we can add a dense layer to interpret the features. All layers will use the ReLU activation function and the He weight initialization scheme, both best practices. We will use the Adam optimizer to optimize the sparse_categorical_crossentropy loss function, suitable for multi-class classification, and we will monitor the classification accuracy metric, which is appropriate given we have the same number of examples in each of the 10 classes. The below code will define and run it will show the struct of the model. # Define a Model model = tf.keras.models.Sequential() # First Convolution ，Kernel:16*3*3 model.add( tf.keras.layers.Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_uniform',input_shape=(28, 28, 1))) model.add( tf.keras.layers.MaxPooling2D((2, 2))) # Second Convolution ，Kernel:32*3*3 model.add( tf.keras.layers.Conv2D(32, (3, 3), activation='relu',kernel_initializer='he_uniform')) model.add( tf.keras.layers.MaxPooling2D((2, 2))) # Third Convolution ，Kernel:32*3*3 model.add( tf.keras.layers.Conv2D(32, (3, 3), activation='relu',kernel_initializer='he_uniform')) model.add( tf.keras.layers.Flatten()) model.add( tf.keras.layers.Dense(32, activation='relu',kernel_initializer='he_uniform')) model.add( tf.keras.layers.Dense(10, activation='softmax')) Fig 4 Training Model After the model is defined, we need to train it. The model will be trained using 5-fold cross-validation. The value of k=5 was chosen to provide a baseline for both repeated evaluation and to not be too large as to require a long running time. Each validation set will be 20% of the training dataset or about 12,000 examples. The training dataset is shuffled prior to being split and the sample shuffling is performed each time so that any model we train will have the same train and validation datasets in each fold, providing an apples-to-apples comparison. We will train the baseline model for a modest 20 training epochs with a default batch size of 32 examples. The validation set for each fold will be used to validate the model during each epoch of the training run, so we can later create learning curves, and at the end of the run, we use the test dataset to estimate the performance of the model. As such, we will keep track of the resulting history from each run, as well as the classification accuracy of the fold. The train_model() function below implements these behaviors, taking the training dataset and test dataset as arguments, and returning a list of accuracy scores and training histories that can be later summarized. from sklearn.model_selection import KFold # train a model using k-fold cross-validation def train_model(dataX, dataY, n_folds=5): scores, histories = list(), list() # prepare cross validation kfold = KFold(n_folds, shuffle=True, random_state=1) for train_ix, validate_ix in kfold.split(dataX): # select rows for train and test trainX, trainY, validate_X, validate_Y = dataX[train_ix], dataY[train_ix], dataX[validate_ix], dataY[validate_ix] # fit model history = model.fit(trainX, trainY, epochs=20, batch_size=32, validation_data=(validate_X, validate_Y), verbose=0) # evaluate model _, acc = model.evaluate(validate_X, validate_Y, verbose=0) print("Accurary: {:.4f}，Total number of figures is {:0>2d}".format(acc * 100.0, len(testY))) # append scores scores.append(acc) histories.append(history) return scores, histories Module Summary After the model has been trained, we can present the results. There are two key aspects to present: the diagnostics of the learning behavior of the model during training and the estimation of the model performance. These can be implemented using separate functions. First, the diagnostics involve creating a line plot showing model performance on the train and validate set during each fold of the k-fold cross-validation. These plots are valuable for getting an idea of whether a model is overfitting, underfitting, or has a good fit for the dataset. We will create a single figure with two subplots, one for loss and one for accuracy. Blue lines will indicate model performance on the training dataset and orange lines will indicate performance on the hold-out validate dataset. The summarize_diagnostics() function below creates and shows this plot given the collected training histories. # plot diagnostic learning curves def summarize_diagnostics(histories): for i in range(len(histories)): # plot loss plt.subplot(2,1,1) plt.title('Cross Entropy Loss') plt.plot(histories[i].history['loss'], color='blue', label='train') plt.plot(histories[i].history['val_loss'], color='orange', label='test') # plot accuracy plt.subplot(2,1,2) plt.title('Classification Accuracy') plt.plot(histories[i].history['accuracy'], color='blue', label='train') plt.plot(histories[i].history['val_accuracy'], color='orange', label='test') plt.show() Fig 5 Next, the classification accuracy scores collected during each fold can be summarized by calculating the mean and standard deviation. This provides an estimate of the average expected performance of the model trained on the test dataset, with an estimate of the average variance in the mean. We will also summarize the distribution of scores by creating and showing a box and whisker plot. The summarize_performance() function below implements this for a given list of scores collected during model training. # summarize model performance def summarize_performance(scores): # print summary print('Accuracy: mean={:.4f} std={:.4f}, n={:0>2d}'.format(np.mean(trained_scores)*100, np.std(trained_scores)*100, len(scores))) # box and whisker plots of results plt.boxplot(scores) plt.show() Fig 6 Verifying predictions According to the above figure, we see that the final trained model can get up to around 87.6% accuracy when predicting the test dataset. And with the trained model, running the below code will demonstrate the result of predictions about some images. def plot_image(i, predictions_array, true_label, img): true_label, img = true_label[i], img[i] plt.grid(False) plt.xticks([]) plt.yticks([]) plt.imshow(img.reshape(28, 28), cmap=plt.cm.binary) predicted_label = np.argmax(predictions_array) if predicted_label == true_label: color = 'blue' else: color = 'red' plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label], 100*np.max(predictions_array), class_names[true_label]), color=color) def plot_value_array(i, predictions_array, true_label): true_label = true_label[i] plt.grid(False) plt.xticks(range(10)) plt.yticks([]) thisplot = plt.bar(range(10), predictions_array, color="#777777") plt.ylim([0, 1]) predicted_label = np.argmax(predictions_array) thisplot[predicted_label].set_color('red') thisplot[true_label].set_color('blue') predictions = model.predict(test_images) # Plot the first X test images, their predicted labels, and the true labels. # Color correct predictions in blue and incorrect predictions in red. num_rows = 5 num_cols = 3 num_images = num_rows*num_cols plt.figure(figsize=(2*2*num_cols, 2*num_rows)) for i in range(num_images): plt.subplot(num_rows, 2*num_cols, 2*i+1) plot_image(i, predictions[i], test_labels, test_images) plt.subplot(num_rows, 2*num_cols, 2*i+2) plot_value_array(i, predictions[i], test_labels) plt.tight_layout() plt.show() Fig 7 Model quantization Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy, especially it's crucial to embedded platforms, as it lacks the compute-intensive performance, the Flash and RAM memory is also very limited. TensorFlow Lite is able to be used to convert an already-trained float TensorFlow model to the TensorFlow Lite format. In addition, the TensorFlow Lite provides several approaches to optimize the mode, among these ways, Integer quantization is an optimization strategy that converts 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This results in a smaller model and increased inferencing speed, which is very valuable for low-power devices such as microcontrollers. The below codes show how to implement the Integer quantization of the trained model, and after running these codes, we can find that the size of Tensorflow Lite mode reduces almost 64.9 KB versus the original model, becomes about 32% of the original size(Fig 8). import os # Convert using integer-only quantization def representative_data_gen(): for input_value in tf.data.Dataset.from_tensor_slices(tf.cast(train_images,tf.float32)).shuffle(500).batch(1).take(150): yield [input_value] # Convert using dynamic range quantization converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_model_quant = converter.convert() # Save the model to disk open("model_dynamic_range_quantization.tflite", "wb").write(tflite_model_quant) ## Size difference Dynamic_range_quantization_model_size = os.path.getsize("model_dynamic_range_quantization.tflite") print("Dynamic range quantization model is %d bytes" % Dynamic_range_quantization_model_size) converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_data_gen # Ensure that if any ops can't be quantized, the converter throws an error converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] # Set the input and output tensors to uint8 (APIs added in r2.3) converter.inference_input_type = tf.uint8 converter.inference_output_type = tf.uint8 tflite_model_advanced_quant = converter.convert() # Save the model to disk open("model_integer_only_quantization.tflite", "wb").write(tflite_model_advanced_quant) Integer_only_quantization_model_size = os.path.getsize("model_integer_only_quantization.tflite") print("Integer_only_quantization_model is %d bytes" % Integer_only_quantization_model_size) difference = Dynamic_range_quantization_model_size - Integer_only_quantization_model_size print("Difference is %d bytes" % difference) Fig 8 Evaluating the TensorFlow Lite model Now we'll run inferences using the TensorFlow Lite Interpreter to compare the model accuracies. First, we need a function that runs inference with a given model and images, and then returns the predictions: # Helper function to run inference on a TFLite model def run_tflite_model(tflite_file, test_image_indices): # Initialize the interpreter interpreter = tf.lite.Interpreter(model_path=str(tflite_file)) interpreter.allocate_tensors() input_details = interpreter.get_input_details()[0] output_details = interpreter.get_output_details()[0] predictions = np.zeros((len(test_image_indices),), dtype=int) for i, test_image_index in enumerate(test_image_indices): test_image = test_images[test_image_index] test_label = test_labels[test_image_index] # Check if the input type is quantized, then rescale input data to uint8 if input_details['dtype'] == np.uint8: input_scale, input_zero_point = input_details["quantization"] test_image = test_image / input_scale + input_zero_point test_image = np.expand_dims(test_image, axis=0).astype(input_details["dtype"]) interpreter.set_tensor(input_details["index"], test_image) interpreter.invoke() output = interpreter.get_tensor(output_details["index"])[0] predictions[i] = output.argmax() return predictions Next, we'll compare the performance of the original model and the quantized model on one image. model_basic_quantization.tflite is the original TensorFlow Lite model with floating-point data. model_integer_only_quantization.tflite is the last model we converted using integer-only quantization (it uses uint8 data for input and output). Let's create another function to print our predictions and run it for testing. import matplotlib.pylab as plt # Change this to test a different image test_image_index = 1 ## Helper function to test the models on one image def test_model(tflite_file, test_image_index, model_type): global test_labels predictions = run_tflite_model(tflite_file, [test_image_index]) plt.imshow(test_images[test_image_index].reshape(28,28)) template = model_type + " Model \n True:{true}, Predicted:{predict}" _ = plt.title(template.format(true= str(test_labels[test_image_index]), predict=str(predictions[0]))) plt.grid(False) Fig 9 Fig 10 Then evaluate the quantized model by using all the test images we loaded at the beginning of this tutorial. After summarizing the prediction result of the test dataset, we can see that the prediction accuracy of the quantized model decrease 7% less than the original model, it's not bad. # Helper function to evaluate a TFLite model on all images def evaluate_model(tflite_file, model_type): test_image_indices = range(test_images.shape[0]) predictions = run_tflite_model(tflite_file, test_image_indices) accuracy = (np.sum(test_labels== predictions) * 100) / len(test_images) print('%s model accuracy is %.4f%% (Number of test samples=%d)' % ( model_type, accuracy, len(test_images))) Deploying model Converting TensorFlow Lite model to C file The following code runs xxd on the quantized model, writes the output to a file called model_quantized.cc, in the file, the model is defined as an array of bytes, and prints it to the screen. The output is very long, so we won’t reproduce it all here, but here’s a snippet that includes just the beginning and end. # Save the file as a C source file xxd -i model_integer_only_quantization.tflite > model_quantized.cc # Print the source file cat model_quantized.cc Fig 11 Deploying the C file to project We use the tensorflow_lite_cifar10 demo as a prototype, then replace the original model and do some code modification, below is the code in the modified main file. #include "board.h" #include "fsl_debug_console.h" #include "pin_mux.h" #include "timer.h" #include <iomanip> #include <iostream> #include <string> #include <vector> #include "tensorflow/lite/kernels/register.h" #include "tensorflow/lite/model.h" #include "tensorflow/lite/optional_debug_tools.h" #include "tensorflow/lite/string_util.h" #include "get_top_n.h" #include "model.h" #define LOG(x) std::cout // ---------------------------- Application ----------------------------- // Lenet Mnist model input data size (bytes). #define LENET_MNIST_INPUT_SIZE 28*28*sizeof(char) // Lenet Mnist model number of output classes. #define LENET_MNIST_OUTPUT_CLASS 10 // Allocate buffer for input data. This buffer contains the input image // pre-processed and serialized as text to include here. uint8_t imageData[LENET_MNIST_INPUT_SIZE] = { #include "clothes_select.inc" }; /* Tresholds */ #define DETECTION_TRESHOLD 60 /*! * @brief Initialize parameters for inference * * @param reference to flat buffer * @param reference to interpreter * @param pointer to storing input tensor address * @param verbose mode flag. Set true for verbose mode */ void InferenceInit(std::unique_ptr<tflite::FlatBufferModel> &model, std::unique_ptr<tflite::Interpreter> &interpreter, TfLiteTensor** input_tensor, bool isVerbose) { model = tflite::FlatBufferModel::BuildFromBuffer(Fashion_MNIST_model, Fashion_MNIST_model_len); if (!model) { LOG(FATAL) << "Failed to load model\r\n"; return; } tflite::ops::builtin::BuiltinOpResolver resolver; tflite::InterpreterBuilder(*model, resolver)(&interpreter); if (!interpreter) { LOG(FATAL) << "Failed to construct interpreter\r\n"; return; } int input = interpreter->inputs()[0]; const std::vector<int> inputs = interpreter->inputs(); const std::vector<int> outputs = interpreter->outputs(); if (interpreter->AllocateTensors() != kTfLiteOk) { LOG(FATAL) << "Failed to allocate tensors!"; return; } /* Get input dimension from the input tensor metadata assuming one input only */ *input_tensor = interpreter->tensor(input); auto data_type = (*input_tensor)->type; if (isVerbose) { const std::vector<int> inputs = interpreter->inputs(); const std::vector<int> outputs = interpreter->outputs(); LOG(INFO) << "input: " << inputs[0] << "\r\n"; LOG(INFO) << "number of inputs: " << inputs.size() << "\r\n"; LOG(INFO) << "number of outputs: " << outputs.size() << "\r\n"; LOG(INFO) << "tensors size: " << interpreter->tensors_size() << "\r\n"; LOG(INFO) << "nodes size: " << interpreter->nodes_size() << "\r\n"; LOG(INFO) << "inputs: " << interpreter->inputs().size() << "\r\n"; LOG(INFO) << "input(0) name: " << interpreter->GetInputName(0) << "\r\n"; int t_size = interpreter->tensors_size(); for (int i = 0; i < t_size; i++) { if (interpreter->tensor(i)->name) { LOG(INFO) << i << ": " << interpreter->tensor(i)->name << ", " << interpreter->tensor(i)->bytes << ", " << interpreter->tensor(i)->type << ", " << interpreter->tensor(i)->params.scale << ", " << interpreter->tensor(i)->params.zero_point << "\r\n"; } } LOG(INFO) << "\r\n"; } } /*! * @brief Runs inference input buffer and print result to console * * @param pointer to image data * @param image data length * @param pointer to labels string array * @param reference to flat buffer model * @param reference to interpreter * @param pointer to input tensor */ void RunInference(const uint8_t* image, size_t image_len, const std::string* labels, std::unique_ptr<tflite::FlatBufferModel> &model, std::unique_ptr<tflite::Interpreter> &interpreter, TfLiteTensor* input_tensor) { /* Copy image to tensor. */ memcpy(input_tensor->data.uint8, image, image_len); /* Do inference on static image in first loop. */ auto start = GetTimeInUS(); if (interpreter->Invoke() != kTfLiteOk) { LOG(FATAL) << "Failed to invoke tflite!\r\n"; return; } auto end = GetTimeInUS(); const float threshold = (float)DETECTION_TRESHOLD /100; std::vector<std::pair<float, int>> top_results; int output = interpreter->outputs()[0]; TfLiteTensor *output_tensor = interpreter->tensor(output); TfLiteIntArray* output_dims = output_tensor->dims; // assume output dims to be something like (1, 1, ... , size) auto output_size = output_dims->data[output_dims->size - 1]; /* Find best image candidates. */ GetTopN<uint8_t>(interpreter->typed_output_tensor<uint8_t>(0), output_size, 1, threshold, &top_results, false); if (!top_results.empty()) { auto result = top_results.front(); const float confidence = result.first; const int index = result.second; if (confidence * 100 > DETECTION_TRESHOLD) { LOG(INFO) << "----------------------------------------\r\n"; LOG(INFO) << " Inference time: " << (end - start) / 1000 << " ms\r\n"; LOG(INFO) << " Detected: " << std::setw(10) << labels[index] << " (" << (int)(confidence * 100) << "%)\r\n"; LOG(INFO) << "----------------------------------------\r\n\r\n"; } } } /*! * @brief Main function */ int main(void) { const std::string labels[] = {"T-shirt/top", "Trouser","Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"}; /* Init board hardware. */ BOARD_ConfigMPU(); BOARD_InitPins(); BOARD_BootClockRUN(); BOARD_InitDebugConsole(); InitTimer(); std::unique_ptr<tflite::FlatBufferModel> model; std::unique_ptr<tflite::Interpreter> interpreter; TfLiteTensor* input_tensor = 0; InferenceInit(model, interpreter, &input_tensor, false); LOG(INFO) << "Fashion MNIST object recognition example using a TensorFlow Lite model.\r\n"; LOG(INFO) << "Detection threshold: " << DETECTION_TRESHOLD << "%\r\n"; /* Run inference on static ship image. */ LOG(INFO) << "\r\nStatic data processing:\r\n"; RunInference((uint8_t*)imageData, (size_t)LENET_MNIST_INPUT_SIZE, labels, model, interpreter, input_tensor); while(1) {} } Testing result After deploying the model in the demo project, then we'll run this demo on the MIMXRT1060 (Fig 12) board for testing. Fig 12 Run the below code to covert the Fashion MNIST image to text The process_image() function can convert a Fashion MNIST image to an include file as static data, then include this file in the demo project. def process_image(image, output_path, num_batch=1): img_data = np.transpose(image, (2, 0, 1)) # Repeat image for batch processing (resulting tensor is NCHW or NHWC) img_data = np.reshape(img_data, (num_batch, img_data.shape[0], img_data.shape[1], img_data.shape[2])) img_data = np.repeat(img_data, num_batch, axis=0) img_data = np.reshape(img_data, (num_batch, img_data.shape[1], img_data.shape[2], img_data.shape[3])) # Serialize image batch img_data_bytes = bytearray(img_data.tobytes(order='C')) image_bytes_per_line = 20 with open(output_path, 'wt') as f: idx = 0 for byte in img_data_bytes: f.write('0X%02X, ' % byte) if idx % image_bytes_per_line == (image_bytes_per_line - 1): f.write('\n') idx = idx + 1 # Return serialized image size return len(img_data_bytes) 2. Run the demo project on board.

This document describes the different source clocks and the main modules that manage which clock source is used to derive the system clocks that exists on the i.MX RT’s devices. It’s important to know the different clock sources available on our devices, modifying the default clock configuration may have different purposes since increasing the processor performance, achieving specific baud rates for serial communications, power saving, or simply getting a known base reference for a clock timer. The hardware used for this document is the following: i.MX RT: EVK-MIMXRT1060 Keep in mind that the described hardware and management clock modules in this document are a general overview of the different platforms and the devices listed above are used as a reference example, some terms and hardware modules functionality may vary between devices of the same platform. For more detailed information about the device hardware modules, please refer to your specific device Reference Manual. RT platforms The Clock Controller Module(CCM) facilitates the clock generation in the RT platforms, many clocking variations are possible and the maximum clock frequency for the i.MX RT1060 device is @600MHz.The following image shows a block diagram of the CCM, the three marked sub-modules are important to understand all the clock path from the clock generation(oscillators or crystals) to the clock management for all the peripherals of the board. Figure 1. Clock Controller Module(CCM) Block Diagram CCM Analog Submodule This submodule contains all the oscillators and several PLL’s that provide a clock source to the principal CMM module. For example, the i.MX RT1060 device supports 2 internal oscillators that combined with suitable external quartz crystal and external load capacitors provide an accurate clock source, another 2 internal oscillators are available for low power modes and as a backup when the system detects a loss of clock. These oscillators provide a fixed frequency for the several PLL’s inside this module. Internal Clock Sources with external components Crystal Oscillator @24MHz Many of the serial IO modules depend on the fixed frequency of 24 MHz. The reference clock that generates this crystal oscillator provides an accurate clock source for all the PLL inputs. Crystal Oscillator @32KHz Generally, RTC oscillators are either implemented with 32 kHz or 32.768 kHz crystals. This Oscillator should always be active when the chip is powered on. Internal Clock sources RC Oscillator @24MHz A lower-power RC oscillator module is available on-chip as a possible alternative to the 24 MHz crystal oscillator after a successful power-up sequence. The 24 MHz RC oscillator is a self-tuning circuit that will output the programmed frequency value by using the RTC clock as its reference. While the power consumption of this RC oscillator is much lower than the 24MHz crystal oscillator, one limitation of this RC oscillator module is that its clock frequency is not as accurate. Oscillator @32KHz The internal oscillator is automatically multiplexed in the clocking system when the system detects a loss of clock. The internal oscillator will provide clocks to the same on-chip modules as the external 32kHz oscillator. Also is used to be useful for quicker startup times and tampering prevention. Note. An external 32KHz clock source must be used since the internal oscillator is not precise enough for long term timekeeping. PLLs There are 7 PLLs in the i.MXRT1060 platform, some with specific functions, for example, create a reference clock for the ARM Core, USB peripherals, etc. Below these PLLs are listed. PLL1 - ARM PLL (functional frequency @600 MHz) PLL2 - System PLL (functional frequency @528 MHz)* PLL3 - USB1 PLL (functional frequency @480 MHz)* PLL4 - Audio PLL PLL5 - Video PLL PLL6 - ENET PLL PLL7 - USB2 PLL (functional frequency @480 MHz) * Two of these PLLs are each equipped with four Phase Fractional Dividers (PFDs) in order to generate additional frequencies for many clock roots. Each PLLs configuration and control functions like Bypass, Output Enable/Disable, and Power Down modes are accessible individually through its PFDs and global configuration and status registers found at the CCM internal memory registers. Clock Control Module(CCM) The Clock Control Module (CCM) generates and controls clocks to the various modules in the design and manages low power modes. This module uses the available clock sources(PLL reference clocks and PFDs) to generate the clock roots. There are two important sub-blocks inside the CCM listed below. Clock Switcher This sub-block provides the registers that control which PLLs and PFDs outputs are selected as the reference clock for the Clock Root Generator. Clock Root Generator This sub-block provides the registers that control most of the secondary clock source programming, including both the primary clock source selection and the clock dividers. The clock roots are each individual clocks to the core, system buses, and all other SoC peripherals, among those, are serial clocks, baud clocks, and special function blocks. All of these clock references are delivered to the Low Power Clock Gating unit(LPCG). Low Power Clock Gating unit(LPCG) The LPCG block receives the root clocks from CCM and splits them to clock branches for each peripheral. The clock branches are individually gated clocks. The following image shows a detailed block diagram of the CMM with the previously described submodules and how they link together. Figure 2. Clock Management System Example: Configure The ARM Core Clock (PLL1) to a different frequency. The Clock tools available in MCUXpresso IDE, allows you to understand and configure the clock source for the peripherals in the platform. The following diagram shows the default PLL1 mode configured @600MHz, the yellow path shows all the internal modules involved in the clock configuration. Figure 3. Default PLL configuration after reset. From the previous image notice that PLL1 is attached from the 24MHz oscillator, then the PLL1 is configured with a pre-scaler of 50 to achieve a frequency @1.2GHz, finally, a frequency divider by 2 let a final frequency @600MHz. 1.1 Modify the PLL1 frequency For example, you can use the Clock tools to configure the PLL pre-scaler to 30, select the PLL1 block and then edit the pre-scaler value, therefore, the final clock frequency is @360MHz, these modifications are shown in the following figure. Figure 4. PLL1 @720MHz, final frequency @360MHz 1.2 Export clock configuration to the project After you complete the clock configuration, the Clock Tool will update the source code in clock_config.c and clock_config.h, including all the clock functional groups that we created with the tool. This will include the clock source for specific peripherals. In the previous example, we configured the PLL1 (ARM PLL) to a functional frequency @360MHz; this is translated to the following structure in source code: “armPllConfig_BOARD_BootClockRUN” and it’s used by “CLOCK_InitArmPll();” inside the “BOARD_BootClockPLL150MRUN();” function. Figure 5. PLL1 configuration struct Figure 6. PLL configuration function Example: The next steps describe how to select a clock source for a specific peripheral. 1.1 Configure clock for specific peripheral For example, using the GPT(General Purpose Timer) the available clock sources are the following: Clock Source Off Peripheral Clock High-Frequency Reference Clock Clock Source from an external pin Low-Frequency Reference Clock Crystal Oscillator Figure 7. General Purpose Timer Clocks Diagram Using the available SDK example project “evkmimxrt1060_gpt_timer” a configuration struct for the peripheral “gptConfig” is called from the main initialization function inside the gpt_timer.c source file, the default configuration function with the configuration struct as a parameter, is shown in the following figure. Figure 8. Function that returns a GPT default configuration parameters The function loads several parameters to the configuration struct(gptConfig), one of the fields is the Clock Source configuration, modifying this field will let us select an appropriate clock source for our application, the following figure shows the default configuration parameters inside the “GPT_GetDefaultConfig();” function. Figure 9. Configuration struct In the default GPT configuration struct, the Peripheral Clock(kGPT_CLockSource_Periph) is selected, the SDK comes with several macros located at “fsl_gpt.h” header file, that helps to select an appropriate clock source. The next figure shows an enumerated type of data that contains the possible clock sources for the GPT. Figure 10. Available clock sources of the GPT. For example, to select the Low-Frequency Reference Clock the source code looks like the following figure. Figure 11. Low-Frequency Reference Clock attached to GPT Notice that all the peripherals come with a specific configuration struct and from that struct fields the default clocking parameters can be modified to fit with our timing requirements. 1.2 Modify the Peripheral Clock frequency from Clock Tools One of the GPT clock sources is the “Peripheral Clock Source” this clock line can be modified from the Clock Tools, the following figure shows the default frequency configuration from Clock Tools view. Figure 12. GPT Clock Root inside CMM In the previous figure, the GPT clock line is @75MHz, notice that this is sourced from the primary peripheral clock line that is @600MHz attached to the ARM core clocks. For example, modify the PERCLK_PODF divider selecting it and changing the divider value to 4, the resulting frequency is @37.5Mhz, the following figure illustrates these changes. Figure 13. GPT & PIT clock line @37.5MHz 1.3 Export clock configuration to the project After you complete the clock configuration, the Clock Tool will update the source code in clock_config.c and clock_config.h, including all the clock functional groups that we created with the tool. This will include the clock source for specific peripherals. In the previous example, we configured the GPT clock root divider by a dividing factor of 4 to achieve a 37.5MHz frequency; this is translated to the following instruction in source code: “CLOCK_SetDiv(kCLOCK_PerclkDiv,3);” inside the “BOARD_BootClockRUN();” function. Figure 14. Frequency divider function References i.MX RT1060 Processor Reference Manual Also visit LPC's System Clocks Kinetis System Clocks

Introduction A common need for GUI applications is to implement a clock function. Whether it be to create a clock interface for the end user's benefit, or just to time animations or other actions, implementing an accurate clock is a useful and important feature for GUI applications. The aim of this document is to help you implement clock functions in your AppWizard project. Methods When implementing a real-time clock, there are a couple of general methods to do so. Use an independent timer in your MCU Using animation objects Each of these methods have their advantages and disadvantages. If you just need a timer that doesn't require extra code and you don't require control or assurance of precision, or maybe you can't spare another timer, using an animation object (method #2) may be a good option in that application. If your application requires an assurance of precision or requires other real-time actions to be performed that AppWizard can't control, it is best to implement an independent timer in your MCU (method #1). Method 1: Independent MCU Timer Implementing a timer via an independent MCU timer allows better control and guarantees the precision because it isn't a shared clock and the developer can adjust the interrupt priorities such that the timer interrupt has the highest priority. AppWizard timing uses a common timer and then time slices activities similar to how an operating system works. It is for this reason that implementing an independent MCU timer is best when you need control over the precision of the timer or you need other real-time actions to be triggered by this timer. When implementing a timer using an independent MCU timer (like the RTC module), an understanding of how to interact with Text widgets is needed. Let's look at this first. Interacting with Text Widgets Editing Text widgets occurs through the use of the emWin library API (the emWin library is the underlying code that AppWizard builds upon). The Text widget API functions are documented in the emWin Graphic Library User Guide and Reference Manual, UM3001. Most of the Text widget API functions require a Text widget handle. Be sure to not confuse this handle for the AppWizard ID. Imagine a clock example where there are two Text widgets in the interface: one for the minutes and one for the seconds. The AppWizard IDs of these objects might be ID_TEXT_MINS and ID_TEXT_SECONDS respectively (again, these are not to be confused with the handle to the Text widget for use by emWin library functions). The first action software should take is to obtain the handle for the Text widgets. This can be done using the WM_GetDialogItem function. The code to get the active window handle and the handle for the two Text widgets is shown below: activeWin = WM_GetActiveWindow(); textBoxMins = WM_GetDialogItem(activeWin, ID_TEXT_MINS); textBoxSecs = WM_GetDialogItem(activeWin, ID_TEXT_SECONDS);‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Note that this function requires the handle to the parent window of the Text widget. If your application has multiple windows or screens, you may need to be creative in how you acquire this handle, but for this example, the software can simply call the WM_GetActiveWindow function (since there is only one screen). When to call these functions can be a bit tricky as well. They can be called before the MainTask() function of the application is called and the application will not crash. However, the handles won't be correct and the Text widgets will not be updated as expected. It's recommended that these handles be initialized when the screen is initialized. An example of how this would be done is shown below: void cbID_SCREEN_CLOCK(WM_MESSAGE * pMsg) { extern WM_HWIN activeWin; extern WM_HWIN textBoxMins; extern WM_HWIN textBoxSecs; extern WM_HWIN textBoxDbg; if(pMsg->MsgId == WM_INIT_DIALOG) { activeWin = WM_GetActiveWindow(); textBoxMins = WM_GetDialogItem(activeWin, ID_TEXT_MINS); textBoxSecs = WM_GetDialogItem(activeWin, ID_TEXT_SECONDS); textBoxDbg = WM_GetDialogItem(activeWin, ID_TEXT_DBG); } GUI_USE_PARA(pMsg); }‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Once the Text widget handles have been acquired, the text can be updated using the TEXT_SetText() function or the TEXT_SetDec() function in this case, because the Text widgets are configured for decimal mode, since we want to display numbers. An example of the code to do this is shown below. /* TEXT_SetDec(Text Widget Handle, Value as Int, Length, Shift, Sign, Leading Spaces) */ if(TEXT_SetDec(textBoxSecs, (int)gSecs, 2, 0, 0, 0)) { /* Perform action here if necessary */ } if(TEXT_SetDec(textBoxMins, (int)gMins, 2, 0, 0, 0)) { /* Perform action here if necessary */ } ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Method 2: Animation Objects When implementing a real-time clock using animation objects, it is necessary to implement a loop. This could be done outside of the AppWizard GUI (in your code) but because the timing precision can't be guaranteed, it's just as easy to implement a loop in the AppWizard GUI if you know how (it isn't very intuitive as to how to do this). Before examining the interactions to do this, let's look at the variables and objects needed to do this. ID_VAR_SECS - This variable holds the current seconds value. ID_VAR_SECS_1 - This variable holds the next second value. ID_TEXT_SECONDS - Text box that displays the current seconds value. ID_END_CNT - Variable that holds the value at which the seconds rolls over and increments the minute count ID_TEXT_MINS - Text box that holds the current minute count. ID_MIN_END_CNT - Variable that holds the value at which the minutes rolls over (which would also increment the hour count if the hours were implemented). ID_BUTTON_SECS - This is a hidden button that initiates actions when the seconds variable has reached the end count. Now, here are the interactions used to implement the clock feature using animation interactions. The heart of the loop are the interactions triggered by ID_VAR_SECS. ID_VAR_SECS -> ID_VAR_SECS_1: When ID_VAR_SECS changes, it needs to add one to ID_VAR_SECS_1 so that the animation will animate to one second from the current time. ID_VAR_SECS -> ID_TEXT_SECONDS: When ID_VAR_SECS changes, it also needs to start the animation from the current value to the next second (ID_VAR_SECS_1). A very essential part of the loop is ensuring the animation restarts every time. So ID_TEXT_SECONDS needs to change the value of ID_VAR_SECS when the animation ends. ID_VAR_SECS is changed to the current time value, ID_VAR_SECS_1. When the ID_TEXT_SECONDS animation ends, it must also decrement the ID_VAR_END_CNT variable. This is analogous to the control variable of a "For" loop being updated. This is done using the ADDVALUE job, adding '-1' to the variable, ID_VAR_END_CNT. When ID_VAR_END_CNT changes, it updates the hidden button, ID_BUTTON_SECS, with the new value. This is analogous to a "For" loop checking whether its control variable is still within its limits. The interactions in group 5 are interactions that restart the loop when the seconds reach the count that we desire. When the loop is restarted, the following actions must be taken: Set ID_VAR_SECS and ID_VAR_SECS_1 to the initial value for the next loop ('0' in this case). Note that ID_VAR_SECS_1 MUST be set before ID_VAR_SECS. Additionally, if the loop is to continue, ID_VAR_SECS and ID_VAR_SECS_1 must be set to the same value. ID_TEXT_SECONDS is set to the initial value. If this isn't done, then the text box will try to animate from the final value to the initial value and then will look "weird". ID_VAR_END_CNT is reset to its initial value (60 in this case). ID_BUTTON_SECS is also responsible for updating the minutes values. In this case, it's incrementing the ID_TEXT_MINS value (counting up in minutes) and decrementing the ID_VAR_MIN_END_CNT Adjusting the time of an animation object The animation object (as well as other emWin objects) use the GUI_X_DELAY function for timing. It is up to the host software to implement this function. In the i.MX RT examples, the General Purpose Timer (GPT) is used for this timer. So how the GPT is configured will affect the timing of the application and the how fast or slow the animations run. The GPT is configured in the function BOARD_InitGPT() which resides in the main source file. The recommended way to adjust the speed of the timer is by changing the divider value to the GPT. Conclusion So we have seen two different methods of implementing a real-time clock in an AppWizard GUI application. Those methods are: Use an independent timer in your MCU Using animation objects Using an independent timer in your MCU may be preferred as it allows for better control over the timing, can allow for real-time actions to be performed that AppWizard can't control, and provides some assurance of precision. Using animation objects may be preferred if you just need a quick timer implementation that doesn't require you to manually add code to your project or use a second timer.

RT1050 HAB Encrypted Image Generation and Analysis 1, Introduction The NXP RT series can support multiple boot modes, it incluses: unsigned image mode, HAB signed image mode, HAB encryption image mode, and BEE encryption image mode. In order to understand the specific structure of the HAB encryption app, this article will generate a non-XIP app image, then generate the relevant burning file through the elftosb.exe tool in the flashloader i.MX-RT1050, and use MFGTOOL to enter the serial download mode to download the .sb file. This article will focus on the download steps of RT1050 HAB encryption related operations, and analyze the structure of the HAB encrypted app image. 2, RT1050 HAB Encypted Operation Procedure At first, we analyze the steps of MFGtool burning, which files are needed, so as to give specific preparation, open the ucl2.xml file in the following path of the flashloader: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\mfgtools-rel\Profiles\MXRT105X\OS Firmware Because we need to use the HAB encrypated boot mode, then we will use MXRT105X-SecureBoot, from the ucl2.xml file, we will find the following related code: Fig 1. MXRT1050-SecureBoot structure As you can see from the above, to implement the secure boot of RT1050, you need to prepare these three files: ivt_flashlloader_signed.bin: it is the signed flashloader binary file enable_hab.sb: it is used to modify the SRK and HABmode in the fuse map boot_image.sb: HAB encrypted app program file Here is a flow chart of the overall HAB encryption operation step, after checking this figure, then we will follow it step by step. Fig 2. MXRT1050 HAB encrypted image flow chart The app image we used in this article is the RAM app, so, at first, we need to prepare one RAM based app image. In this document, we are directly use the prepared RAM based app image: evkbimxrt1050_led_softwarereset_0xa000.s19, this app code function is: After download the code to the MIMXRT1050-EVKB(qspi flash) board, the on board led D18 will blinky and printf the information, after pressing the WAKEUP button SW8, the code will implement software reset and printf the related information. The unsigned code test print result are as follows: BOARD RESET start. Helloworld. WAKEUP key pressed, will do software system reset. BOARD RESET start. Helloworld. 2.1 CST tool preparation Because the contains a lot of steps, then customer can refer to the following document do the related configuration, this document, we won’t give the CST configuration detail steps. Please check these documents: https://www.cnblogs.com/henjay724/p/10219459.html https://community.nxp.com/docs/DOC-340904 Security Application Note AN12079 After the CST tool configuration, please copy the cst.exe, crts filder, key folder from cst folder to the same folder that holds elftosb executable files: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\elftosb\win Please also copy SRK_1_2_3_4_fuse.bin and SRK_1_2_3_4_table.bin to the above folder. 2.2 Sign flashloader Please refer to application note AN12079 chapter 3.3.1, copy flashloader.elf from folder path: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Flashloader And the imx-flexspinor-normal-signed.bd from folder path: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\bd_file\imx10xx to the folder: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\elftosb\win Please open commander window under the elftosb folder, then input this commander: elftosb.exe -f imx -V -c imx-flexspinor-flashloader-signed.bd -o ivt_flashloader_signed.bin flashloader.elf Fig 3. Sign flashloader This steps will generate the ivt_flashlaoder_signed.bin, which is needed to put under the MFGtool OS Firmware folder, just used for enter the signed flashloader mode. 2.3 SRK and HAB mode fuse modification files Please refer to AN12079 chapter 4.3, copy the enable_hab.bd file from folder path: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\bd_file\imx10xx to this folder path: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\elftosb\win Please refer to the chapter 2.1 generated SRK_1_2_3_4_fuse.bin, modify the enable_hab.bd like the following picture: Fig 4. enable_hab.bd SRK and HAB mode fuse modification Then, in the elftosb window, please input the following command, just used to generate the enable_hab.sb program file: elftosb.exe -f kinetis -V -c enable_hab.bd -o enable_hab.sb Fig 5. SRK and HAB mode program files generation 2.4 APP Encrypted Image If you want to do the HAB encrypted image download, you need to prepare one non-XIP app image, here we prepared one RAM based APP srec files. Because the app file is the RAM files, then we also need the related RAM encrypted .bd files, please copy imx-itcm-encrypted.bd from the folder path: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\bd_file\imx10xx to this folder path： Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\elftosb\win Open imx-itcm-encrypted.bd, then modify the following content: options { flags = 0x0c; # Note: This is an example address, it can be any non-zero address in ITCM region startAddress = 0x8000; ivtOffset = 0x1000; initialLoadSize = 0x2000; # Note: This is required if the cst and elftsb are not in the same folder # Note: This is required if the default entrypoint is not the Reset_Handler # Please set the entryPointAddress to Reset_Handler address entryPointAddress = 0x0000a2dd; } Here, we need to note these two points: (1) ivtOffset = 0x1000; If the external flash is flexspi flash, then we need to modify ivtOffset as 0X1000, if it is the nandflash, we need to use the 0X400. (2) entryPointAddress = 0x0000a2dd; The entryPointsAddress should be the app code reset handlder, it is the app start address+4 data, the entry address is also OK, but we suggest you to use the app Reset_Handler address. Fig 6. App reset handler address Then input the following commander in the elftosb windows: elftosb.exe -f imx -V -c imx-itcm-encrypted.bd -o ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted.bin evkbimxrt1050_led_softwarereset_0xa000.s19 Fig 7. App HAB Encrypted file generation Please note, we need to record the generated key blob offset address, it is 0XA00, just like the above data in the red frame, this address will be used in the next chapter’s .bd file. After this step, it will generate 7 files: (1) ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted.bin, this file includes the FDCB which is filled with 0, IVT, BD, DCD, APP HAB encrypted image data, CSF data (2) ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted_nopadding.bin, compare with ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted.bin, this file deletes the 0s which is above IVT range. (3) Csf.bin, it is the HAB data area, you can find the data contains the csf data, it is from 0X8000 to 0X8F80 in the generated ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted.bin. Fig 8. Csf data and the encrypted app relationship (4) dek.bin Fig 9. Dek data DEK data is the AES-128 bits key, it is not defined by the customer, it is random generated automatically by the HAB encrypted tool. (5) input.csf Open it, you can find the following content: Fig10. Input csf file content (6) rawbytes.bin, this is the app image plaintext data, it doesn’t contains the FDCB,IVT,BOOTDATA, DCD, csf etc. (7) temp.bin, it is the temperate file, compare with ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted.bin, no csf files. 2.5 HAB Encrypted QSPI program file Here we need to prepare one program_flexspinor_image_qspinor_keyblob.bd file, and put it under the same folder as elftosb, this file is used to generate the HAB encrypted program .sb file. Because the flashloader package didn’t contains it, then we paste all the related content, and I will also attach it in the attachment. # The source block assign file name to identifiers sources { myBinFile = extern (0); dekFile = extern (1); } constants { kAbsAddr_Start= 0x60000000; kAbsAddr_Ivt = 0x60001000; kAbsAddr_App = 0x60002000; } # The section block specifies the sequence of boot commands to be written to the SB file section (0) { #1. Prepare Flash option # 0xc0000006 is the tag for Serial NOR parameter selection # bit [31:28] Tag fixed to 0x0C # bit [27:24] Option size fixed to 0 # bit [23:20] Flash type option # 0 - QuadSPI SDR NOR # 1 - QUadSPI DDR NOR # 2 - HyperFLASH 1V8 # 3 - HyperFLASH 3V # 4 - Macronix Octal DDR # 6 - Micron Octal DDR # 8 - Adesto EcoXIP DDR # bit [19:16] Query pads (Pads used for query Flash Parameters) # 0 - 1 # 2 - 4 # 3 - 8 # bit [15:12] CMD pads (Pads used for query Flash Parameters) # 0 - 1 # 2 - 4 # 3 - 8 # bit [11: 08] Quad Mode Entry Setting # 0 - Not Configured, apply to devices: # - With Quad Mode enabled by default or # - Compliant with JESD216A/B or later revision # 1 - Set bit 6 in Status Register 1 # 2 - Set bit 1 in Status Register 2 # 3 - Set bit 7 in Status Register 2 # 4 - Set bit 1 in Status Register 2 by 0x31 command # bit [07: 04] Misc. control field # 3 - Data Order swapped, used for Macronix OctaFLASH devcies only (except MX25UM51345G) # 4 - Second QSPI NOR Pinmux # bit [03: 00] Flash Frequency, device specific load 0xc0000006 > 0x2000; # Configure QSPI NOR FLASH using option a address 0x2000 enable flexspinor 0x2000; #2 Erase flash as needed. erase 0x60000000..0x60020000; #3. Program config block # 0xf000000f is the tag to notify Flashloader to program FlexSPI NOR config block to the start of device load 0xf000000f > 0x3000; # Notify Flashloader to response the option at address 0x3000 enable flexspinor 0x3000; #5. Program image load myBinFile > kAbsAddr_Ivt; #6. Generate KeyBlob and program it to flexspinor # Load DEK to RAM load dekFile > 0x10100; # Construct KeyBlob Option #--------------------------------------------------------------------------- # bit [31:28] tag, fixed to 0x0b # bit [27:24] type, 0 - Update KeyBlob context, 1 Program Keyblob to flexspinor # bit [23:20] keyblob option block size, must equal to 3 if type =0, # reserved if type = 1 # bit [19:08] Reserved # bit [07:04] DEK size, 0-128bit 1-192bit 2-256 bit, only applicable if type=0 # bit [03:00] Firmware Index, only applicable if type = 1 # if type = 0, next words indicate the address that holds dek # the 3rd word #---------------------------------------------------------------------------- # tag = 0x0b, type=0, block size=3, DEK size=128bit load 0xb0300000 > 0x10200; # dek address = 0x10100 load 0x00010100 > 0x10204; # keyblob offset in boot image # Note: this is only an example bd file, the value must be replaced with actual # value in users project load 0x0000a000 > 0x10208; enable flexspinor 0x10200; #7. Program KeyBlob to firmware0 region load 0xb1000000 > 0x10300; enable flexspinor 0x10300; }‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Please note, in the above chapter, fig 7, we mentioned the keyblob offset address, we need to modify it in the following code: load 0x0000a000 > 0x10208; Now, combine program_flexspinor_image_qspinor_keyblob.bd, ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted_nopadding.bin and dek.bin file together, we use the following commander to generate the boot_image.sb: elftosb.exe -f kinetis -V -c program_flexspinor_image_qspinor_keyblob.bd -o boot_image.sb ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted_nopadding.bin dek.bin Fig 11. App HAB encrypted program file generation Until now, we will find, all the related HAB encrypted files is prepared. 2.6 MFG Tool program HAB Encrypted files to RT1050-EVKB Before we program it, please copy the following 3 files which is in the elftosb folder: ivt_flashloader_signed.bin enable_hab.sb boot_image.sb to this folder: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\mfgtools-rel\Profiles\MXRT105X\OS Firmware Please modify cfg.ini, the file path is: Flashloader_i.MXRT1050_GA\Flashloader_RT1050_1.1\Tools\mfgtools-rel Modify the content as: [profiles] chip = MXRT105X [platform] board = [LIST] name = MXRT105X-SecureBoot Choose MXRT105X-SecureBoot program mode. Then open the tool MfgTool2.exe, the board MIMXRT1050-EVKB(need to modify the on board resistor, use the qspi flash) mode should be serial download mode, just modify SW7:1-OFF,2-OFF,3-OFF, 4-ON, connect two usb cable between PC and the board J28 and J9. After the connection, you will find the MfgTool2.exe can detect the HID device: Fig 12. MFG tool program After the program is finished, power off the board, modify the boot mode to internal boot, it is SW7:1-OFF,2-OFF,3-ON, 4-OFF，connect the COM terminal, power on the EVKB board, after reset, you will find the D18 led is blinking, after you press the SW8, you will find the following printf information: BOARD RESET start. Helloworld. WAKEUP key pressed, will do software system reset. ? BOARD RESET start. Helloworld.‍‍‍‍‍‍‍‍‍‍‍‍‍ So, the HAB encrypted image works OK now. 3. App HAB encrypted image structure analysis 3.1 MCUBootUtility Configuration to check the RT Encrypted image Here, we can also use MCUBootUtility tool to check the RT chip encrypted image and the fuse data. If the cst is your own configured, please do the following configuration at first: （1）Copy the configured cst folder to folder: NXP-MCUBootUtility-2.0.0\tools Delete the original cst folder. （2）Copy SRK_1_2_3_4_fuse.bin and SRK_1_2_3_4_table.bin to folder: NXP-MCUBootUtility-2.0.0\gen\hab_cert Now, you can use the new MCUBootutility to connect your board which already done the HAB encrypted method. 3.1 RT1050 fuse map comparation Before do the HAB encrypted image program, I have read out the whole fuse map as follows: Fig 13. MIMXRT1050-EVKB fuse map before HAB encrypted image Fig 14. MIMXRT1050-EVKB fuse map after HAB encrypted image Compare the fuse map between do the HAB encrypted image and no HAB encrypted image, we can find two difference: HAB mode, 0X460 bit1：0 open， 1 close SRK area We can find, after program the HAB encrypted image, the SRK fuse data is the same as the SRK data which is defined in the enable_hab.bd. 3.2 Readout HAB encrypted QSPI APP image structure analysis From MCUBootUtility tool, we can find the HAB Encypted image structure should be like this: Fig 15. HAB Encrypted image structure What about the real example image case? Now, we use the MCUbootUtility tool to read out our HAB encrypted image, from address 0X60000000, the readout size is 0XB000. The detail image structure is like following: Fig 16. HAB Encypted image example structure 1): IVT: hdr, IVT header, more details, check hab_hdr 2): IVT: entry, the app entrypointAddress, it should be the reset_handler address, in this document example, it is the address 0xa004 data, the plaintext is 0X00A2DD, but after the HAB encrypted, we can find the address -x60002004 data is the encrypted data 3): IVT: reserved 4): IVT: DCD, it is used for the DRAM SEMC configuration, in this example, we didn’t use the SDDRAM, so the data is 0. 5): IVT: BOOT_DATA, used to indicate the BOOT_DATA RAM start address 0X9020. 6): IVT: self, ivt self RAM start address is 0X9000 7): IVT:CSF, it is used to indicate the CST start address, this example csf ram address is 0X00010000. 8): IVT:reserved 9): BOOT_DATA: RAM image start, the whole image RAM start address, this RAM example BOOT_DATA is 0X8000,0XA000-0X2000=0X8000 10): BOOT_DATA: size, APP while size, it is 0X0000A200, after checking the while generated HAB encrypted app image size, you can find the image end size is really 0XA200, just lke the fig 16. 11): HAB Encypted app data, please check ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted.bin file, the address 0X2000-0X7250 data, you will find it is the same. 12): HAB data, it incluses the csf, certificate etc data, you can compare the file ivt_evkbimxrt1050_led_softwarereset_0xa000_encrypted.bin address 0X8000-0x8f70 data, it is the same. 13):DEK blob, it is the DEK key blob related data, the offset address is 0XA000, the same as fig 7. FDCB,IVT,BOOT DATA are all plaintext, but app image area is the HAB encrypted data, HAB and the DEK blocb is the generated data put in the related memory. Conclusion This document we mainly use the elftosb and the MFGTool to generate the HAB encrypted image, and download it to the RT1050 EVKB board, document give the whole detail steps, and us ethe MCUBootutility tool to read out the HAB encrypted image, and analysis the HAB encrypted image structure with the examples. After compare with the generated mid files, we can find all the data is consist, and all the encrypted data range is the same. The test result also demonstrate the HAB encrypted code function works, the HAB encrypted boot has no problems. All the related files is in the attachment.

Introduction The RT1064 is the only Crossover MCU from the RT family that comes with an on-chip flash, it has a 4MB Winbond W25Q32JV memory. It's important to remark that it's not an internal memory but an on-chip flash connected to the RT1064 through the FlexSPI2 interface. Having this flash eliminates the need of an external memory. Although the intended use of the on-chip flash is to store your application and do XiP, there might be cases where you would like to use some space of this memory as NVM. This document will explain how to modify one of the SDK example projects to achieve this. Prerequisites RT1064-EVK The latest SDK which you can download from the following link: Welcome | MCUXpresso SDK Builder. In this document, I will use MCUXpresso IDE but you can use the IDE of your preference. The modifications that you need to make are the same despite the IDE that you are using. Importing the SDK example The RT1064-EVK comes with two external flash memories, 512 Mb Hyper Flash and 64 Mb QSPI Flash. The RT1064-SDK includes two example projects that demonstrate how to use any of these two external flash memories as NVM. In this document, we will take as a base one of these examples to modify it to use the on-chip flash as NVM. Once you downloaded and imported the SDK into MCUXpresso IDE, you will need to import the example flexspi_nor_polling_transfer into your workspace. Modifying the SDK example The external NOR Flash memory of the EVK is connected to the RT1064 through the FlexSPI1 interface. Due to this, the example project that we just imported initializes the FlexSPI1 interface pins. In our case, we want to use the on-chip flash that is connected through the FlexSPI2 instance, since we will boot from this memory, the ROM bootloader will configure the pins of this FlexSPI interface. So, in the function BOARD_InitPins we can delete all the pins that were related to the FlexSPI1 interface. At the end your function should look like the following: void BOARD_InitPins(void) { CLOCK_EnableClock(kCLOCK_Iomuxc); /* iomuxc clock (iomuxc_clk_enable): 0x03u */ IOMUXC_SetPinMux( IOMUXC_GPIO_AD_B0_12_LPUART1_TX, /* GPIO_AD_B0_12 is configured as LPUART1_TX */ 0U); /* Software Input On Field: Input Path is determined by functionality */ IOMUXC_SetPinMux( IOMUXC_GPIO_AD_B0_13_LPUART1_RX, /* GPIO_AD_B0_13 is configured as LPUART1_RX */ 0U); /* Software Input On Field: Input Path is determined by functionality */ /* Software Input On Field: Force input path of pad GPIO_SD_B1_11 */ IOMUXC_SetPinConfig( IOMUXC_GPIO_AD_B0_12_LPUART1_TX, /* GPIO_AD_B0_12 PAD functional properties : */ 0x10B0u); /* Slew Rate Field: Slow Slew Rate Drive Strength Field: R0/6 Speed Field: medium(100MHz) Open Drain Enable Field: Open Drain Disabled Pull / Keep Enable Field: Pull/Keeper Enabled Pull / Keep Select Field: Keeper Pull Up / Down Config. Field: 100K Ohm Pull Down Hyst. Enable Field: Hysteresis Disabled */ IOMUXC_SetPinConfig( IOMUXC_GPIO_AD_B0_13_LPUART1_RX, /* GPIO_AD_B0_13 PAD functional properties : */ 0x10B0u); /* Slew Rate Field: Slow Slew Rate Drive Strength Field: R0/6 Speed Field: medium(100MHz) Open Drain Enable Field: Open Drain Disabled Pull / Keep Enable Field: Pull/Keeper Enabled Pull / Keep Select Field: Keeper Pull Up / Down Config. Field: 100K Ohm Pull Down Hyst. Enable Field: Hysteresis Disabled */ } ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Function flexspi_nor_flash_init initializes the FlexSPI interface but in our case, the ROM bootloader already did this for us. So we need to make some modifications to this function as well. The only things that we will need from this function are to update the LUT and to do a software reset of the FlexSPI2 interface. At the end your function should look like the following: void flexspi_nor_flash_init(FLEXSPI_Type *base) { /* Update LUT table. */ FLEXSPI_UpdateLUT(base, 0, customLUT, CUSTOM_LUT_LENGTH); /* Do software reset. */ FLEXSPI_SoftwareReset(base); } ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Inside the file app.h, we need to change some macros since now we will use a different memory and a different FlexSPI interface. The following macros are the ones that you need to modify: Macro Before After Observations EXAMPLE_FLEXPI FLEXSPI FLEXSPI2 On-chip flash is connected through FlexSPI2 interface FLASH_SIZE 0x2000 0x200 The size of the on-chip flash is different from the external NOR flash EXAMPLE_FLEXSPI_AMBA_BASE FlexSPI_AMBA_BASE FlexSPI2_AMBA_BASE On-chip flash is connected through FlexSPI2 interface EXAMPLE_SECTOR 4 100 The size of the on-chip flash is less than the external NOR flash, hence we will decrease the sector size to avoid erasing our application Everything else in this file remains the same. Finally, there are two changes that we need to make in the customLUT. To understand these changes you need to analyze the datasheets of the memories. But in the end, the two modifications are the followings: Erase sector command of the on-chip flash is 0x20 instead of 0xD7. Exchange the FLEXSPI command index for NOR_CMD_LUT_SEQ_IDX_READ_FAST_QUAD and NOR_CMD_LUT_SEQ_IDX_READ_NORMAL to align with XIP settings. At the end your customLUT should look like the following: const uint32_t customLUT[CUSTOM_LUT_LENGTH] = { /* Fast read quad mode - SDR */ [4 * NOR_CMD_LUT_SEQ_IDX_READ_FAST_QUAD] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x6B, kFLEXSPI_Command_RADDR_SDR, kFLEXSPI_1PAD, 0x18), [4 * NOR_CMD_LUT_SEQ_IDX_READ_FAST_QUAD + 1] = FLEXSPI_LUT_SEQ( kFLEXSPI_Command_DUMMY_SDR, kFLEXSPI_4PAD, 0x08, kFLEXSPI_Command_READ_SDR, kFLEXSPI_4PAD, 0x04), /* Fast read mode - SDR */ [4 * NOR_CMD_LUT_SEQ_IDX_READ_FAST] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x0B, kFLEXSPI_Command_RADDR_SDR, kFLEXSPI_1PAD, 0x18), [4 * NOR_CMD_LUT_SEQ_IDX_READ_FAST + 1] = FLEXSPI_LUT_SEQ( kFLEXSPI_Command_DUMMY_SDR, kFLEXSPI_1PAD, 0x08, kFLEXSPI_Command_READ_SDR, kFLEXSPI_1PAD, 0x04), /* Normal read mode -SDR */ [4 * NOR_CMD_LUT_SEQ_IDX_READ_NORMAL] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x03, kFLEXSPI_Command_RADDR_SDR, kFLEXSPI_1PAD, 0x18), [4 * NOR_CMD_LUT_SEQ_IDX_READ_NORMAL + 1] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_READ_SDR, kFLEXSPI_1PAD, 0x04, kFLEXSPI_Command_STOP, kFLEXSPI_1PAD, 0), /* Read extend parameters */ [4 * NOR_CMD_LUT_SEQ_IDX_READSTATUS] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x81, kFLEXSPI_Command_READ_SDR, kFLEXSPI_1PAD, 0x04), /* Write Enable */ [4 * NOR_CMD_LUT_SEQ_IDX_WRITEENABLE] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x06, kFLEXSPI_Command_STOP, kFLEXSPI_1PAD, 0), /* Erase Sector */ [4 * NOR_CMD_LUT_SEQ_IDX_ERASESECTOR] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x20, kFLEXSPI_Command_RADDR_SDR, kFLEXSPI_1PAD, 0x18), /* Page Program - single mode */ [4 * NOR_CMD_LUT_SEQ_IDX_PAGEPROGRAM_SINGLE] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x02, kFLEXSPI_Command_RADDR_SDR, kFLEXSPI_1PAD, 0x18), [4 * NOR_CMD_LUT_SEQ_IDX_PAGEPROGRAM_SINGLE + 1] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_WRITE_SDR, kFLEXSPI_1PAD, 0x04, kFLEXSPI_Command_STOP, kFLEXSPI_1PAD, 0), /* Page Program - quad mode */ [4 * NOR_CMD_LUT_SEQ_IDX_PAGEPROGRAM_QUAD] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x32, kFLEXSPI_Command_RADDR_SDR, kFLEXSPI_1PAD, 0x18), [4 * NOR_CMD_LUT_SEQ_IDX_PAGEPROGRAM_QUAD + 1] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_WRITE_SDR, kFLEXSPI_4PAD, 0x04, kFLEXSPI_Command_STOP, kFLEXSPI_1PAD, 0), /* Read ID */ [4 * NOR_CMD_LUT_SEQ_IDX_READID] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x9F, kFLEXSPI_Command_READ_SDR, kFLEXSPI_1PAD, 0x04), /* Enable Quad mode */ [4 * NOR_CMD_LUT_SEQ_IDX_WRITESTATUSREG] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x01, kFLEXSPI_Command_WRITE_SDR, kFLEXSPI_1PAD, 0x04), /* Enter QPI mode */ [4 * NOR_CMD_LUT_SEQ_IDX_ENTERQPI] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x35, kFLEXSPI_Command_STOP, kFLEXSPI_1PAD, 0), /* Exit QPI mode */ [4 * NOR_CMD_LUT_SEQ_IDX_EXITQPI] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_4PAD, 0xF5, kFLEXSPI_Command_STOP, kFLEXSPI_1PAD, 0), /* Read status register */ [4 * NOR_CMD_LUT_SEQ_IDX_READSTATUSREG] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0x05, kFLEXSPI_Command_READ_SDR, kFLEXSPI_1PAD, 0x04), /* Erase whole chip */ [4 * NOR_CMD_LUT_SEQ_IDX_ERASECHIP] = FLEXSPI_LUT_SEQ(kFLEXSPI_Command_SDR, kFLEXSPI_1PAD, 0xC7, kFLEXSPI_Command_STOP, kFLEXSPI_1PAD, 0), }; ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ These are all the changes that you need to make to the flexspi_nor_polling_transfer to use the on-chip flash as NVM instead of the external NOR flash. Important notes It's important to mention that you cannot write, erase or read the on-chip flash while making XiP from this memory. So, you need to reallocate all the instructions that read, write and erase the flash to the internal FlexRAM. Fortunately, the example that we use as a base (flexspi_nor_polling_transfer) already does this. This is accomplished thanks to the files located in the linkscritps folder. To learn more about how this works, please refer to section 17.15 of the MCUXpresso IDE User Guide. Additional resources Datasheet of the on-chip memory (Winbond W25Q32JV). MCUXpresso IDE User Guide. I hope that you find this document useful! Regards, Victor

Goal Our goal is to train a model that can take a value, x, and predict its sine, y. In a real-world application, if you needed the sine of x, you could just calculate it directly. However, by training a model to approximate the result, we can demonstrate the basics of machine learning. TensorFlow and Keras TensorFlow is a set of tools for building, training, evaluating, and deploying machine learning models. Originally developed at Google, TensorFlow is now an open-source project built and maintained by thousands of contributors across the world. It is the most popular and widely used framework for machine learning. Most developers interact with TensorFlow via its Python library. TensorFlow does many different things. In this post, we’ll use Keras, TensorFlow’s high-level API that makes it easy to build and train deep learning networks. To enable TensorFlow on mobile and embedded devices, Google developed the TensorFlow Lite framework. It gives these computationally restricted devices the ability to run inference on pre-trained TensorFlow models that were converted to TensorFlow Lite. These converted models cannot be trained any further but can be optimized through techniques like quantization and pruning. Building the Model To building the Model, we should follow the below steps. Obtain a simple dataset. Train a deep learning model. Evaluate the model’s performance. Convert the model to run on-device. Please navigate to the URL in your browser to open the notebook directly in Colab, this notebook is designed to demonstrate the process of creating a TensorFlow model and converting it to use with TensorFlow Lite. Deploy the mode to the RT MCU Hardware Board: MIMXRT1050 EVK Board Fig 1 MIMXRT1050 EVK Board Template demo code: evkbimxrt1050_tensorflow_lite_cifar10 Code /* Copyright 2017 The TensorFlow Authors. All Rights Reserved. Copyright 2018 NXP. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ==============================================================================*/ #include "board.h" #include "pin_mux.h" #include "clock_config.h" #include "fsl_debug_console.h" #include <iostream> #include <string> #include <vector> #include "timer.h" #include "tensorflow/lite/kernels/register.h" #include "tensorflow/lite/model.h" #include "tensorflow/lite/optional_debug_tools.h" #include "tensorflow/lite/string_util.h" #include "Sine_mode.h" int inference_count = 0; // This is a small number so that it's easy to read the logs const int kInferencesPerCycle = 30; const float kXrange = 2.f * 3.14159265359f; #define LOG(x) std::cout void RunInference() { std::unique_ptr<tflite::FlatBufferModel> model; std::unique_ptr<tflite::Interpreter> interpreter; model = tflite::FlatBufferModel::BuildFromBuffer(sine_model_quantized_tflite, sine_model_quantized_tflite_len); if (!model) { LOG(FATAL) << "Failed to load model\r\n"; exit(-1); } model->error_reporter(); tflite::ops::builtin::BuiltinOpResolver resolver; tflite::InterpreterBuilder(*model, resolver)(&interpreter); if (!interpreter) { LOG(FATAL) << "Failed to construct interpreter\r\n"; exit(-1); } float input = interpreter->inputs()[0]; if (interpreter->AllocateTensors() != kTfLiteOk) { LOG(FATAL) << "Failed to allocate tensors!\r\n"; } while(true) { // Calculate an x value to feed into the model. We compare the current // inference_count to the number of inferences per cycle to determine // our position within the range of possible x values the model was // trained on, and use this to calculate a value. float position = static_cast<float>(inference_count) / static_cast<float>(kInferencesPerCycle); float x_val = position * kXrange; float* input_tensor_data = interpreter->typed_tensor<float>(input); *input_tensor_data = x_val; Delay_time(1000); // Run inference, and report any error TfLiteStatus invoke_status = interpreter->Invoke(); if (invoke_status != kTfLiteOk) { LOG(FATAL) << "Failed to invoke tflite!\r\n"; return; } // Read the predicted y value from the model's output tensor float* y_val = interpreter->typed_output_tensor<float>(0); PRINTF("\r\n x_value: %f, y_value: %f \r\n", x_val, y_val[0]); // Increment the inference_counter, and reset it if we have reached // the total number per cycle inference_count += 1; if (inference_count >= kInferencesPerCycle) inference_count = 0; } } /* * @brief Application entry point. */ int main(void) { /* Init board hardware */ BOARD_ConfigMPU(); BOARD_InitPins(); BOARD_InitDEBUG_UARTPins(); BOARD_BootClockRUN(); BOARD_InitDebugConsole(); NVIC_SetPriorityGrouping(3); InitTimer(); std::cout << "The hello_world demo of TensorFlow Lite model\r\n"; RunInference(); std::flush(std::cout); for (;;) {} } ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Test result On the MIMXRT1050 EVK Board, we log the input data: x_value and the inferenced output data: y_value via the Serial Port. Fig2 Received data In a while loop function, It will run inference for a progression of x values in the range 0 to 2π and then repeat. Each time it runs, a new x value is calculated, the inference is run, and the data is output. Fig3 Test result In further, we use Excel to display the received data against our actual values as the below figure shows. Fig4 Dot Plot You can see that, for the most part, the dots representing predicted values form a smooth sine curve along the center of the distribution of actual values. In general, Our network has learned to approximate a sine curve.

1 Introduction With the quick development of science and technology, the Internet of Things(IoT) is widely used in various areas, such as industry, agriculture, environment, transportation, logistics, security, and other infrastructure. IoT usage makes our lives more colorful and intelligent. The explosive development of the IoT cannot be separated from the cloud platform. At present, there are many types of cloud services on the market, such as Amazon's AWS, Microsoft's Azure, google cloud, China's Alibaba Cloud, Baidu Cloud, OneNet, etc. Amazon AWS Cloud is a professional cloud computing service that is provided by Amazon. It provides a complete set of infrastructure and cloud solutions for customers in various countries and regions around the world. It is currently a cloud computing with a large number of users. AWS IoT is a managed cloud platform that allows connected devices to easily and securely interact with cloud applications and other devices. NXP crossover MCU RT product has launched a series of AWS sample codes. This article mainly explains the remote_control_wifi_nxp code in the official MIMXRT1060-EVK SDK as an example to realize the data interaction with AWS IoT cloud, Android mobile APP, and MQTTfx client. The cloud topology of this article is as follows: Fig.1-1 2 AWS cloud operation 2.1 Create an AWS account Prepare a credit card, and then go to the below amazon link to create an AWS account: https://console.aws.amazon.com/console/home 2.2 Create a Thing Open the AWS IOT link: https://console.aws.amazon.com/iot Choose the Things item under manage, if it is the first time usage, customer can choose “register a thing” to create the thing. If it is used in the previous time, customers can click the “create” button in the right corner to create the thing. Choose “create a single thing” to create the new thing, more details check the following picture. Fig. 2-1 Fig.2-2 Fig.2-3 2.3 Create certificate Create a certificate for the newly created thing, click the “create certificate” button under the following picture: Fig.2-4 After the certificate is built, it will have the information about the certificate created, it means the certificate is generated and can be used. Fig. 2-5 Please note, download files: certificate for this thing, public key, private key. It will be used in the mqttfx tool configuration. Click “A root CA for AWS for Download”, download the root CA for AWS IoT, the mqttfx tool setting will also use it. Open the root CA download link, can download the CA certificate. RSA 2048 bit key: VeriSign Class 3 Public Primary G5 root CA certificate Fig. 2-6 At last, we can get these files: 7abfd7a350-certificate.pem.crt 7abfd7a350-private.pem.key 7abfd7a350-public.pem.key AmazonRootCA1.pem Save it, it will be used later. Click “active” button to active the certificate, and click “Done” button. The policy will be attached later. 2.4 Create Policies Back to the iot view page: https://console.aws.amazon.com/iot/ Select the policies under Secure item, to create the new policies. Fig. 2-7 Input the policy name, in the action area, fill: iot:*, Resource ARN area fill: * Check Allow item, click the create button to finish the new policy creation. Fig. 2-8 2.5 Things attach relationship After the thing, certificate, policies creation, then will attach the policy to the certificate, and attach the certificate to the Things. Fig. 2-9 Choose the certificates under Secure item, in the related certificate item, choose “…”, you will find the down list, click “attach policy”, and choose the newly created policy. Then click attach thing, choose the newly created thing. Fig. 2-10 Fig. 2-11 Fig. 2-12 Now, open the Things under Mange item, check the detail things related information. Fig.2-13 Double click the thing, in the Interact item, we can find the Rest API Endpoint, the RT code and the mqttfx tool will use this endpoint to realize the cloud connection. Fig. 2-14 Check the security, you will find the previously created certificate, it means this thing already attach the new certificates: Fig. 2-15 Until now, we already finish the Things related configuration, then it will be used for the MQTT fx, Android app, RT EVK board connections, and testing, we also can check the communication information through the AWS shadow in the webpage directly. 3 Android related configuration 3.1 AWS cognito configuration If use the Android app to communicate with the AWS IoT clould, the AWS side still needs to use the cognito service to authorize the AWS IoT, then access the device shadows. Create a new identity pools at first from the following link: https://console.aws.amazon.com/cognitohttps://console.aws.amazon.com/cognito Fig. 3-1 Click “manage Identity pools”, after enter it, then click “create new identity pool” Fig. 3-2 Fig. 3-3 Fig. 3-4 Here, it will generate two Roles： Cognito_PoolNameAuth_Role Cognito_PoolNameUnauth_Role Click Allow, to finish the identity pool creation. Fig. 3-5 Please record the related Identity pool ID, it will be used in the Android app .properties configuration files. 3.2 Create plicies in IAM for cognito Open https://console.aws.amazon.com/iam Click the “policies” item under “access management” Fig. 3-6 Choose “create policy”, create a IAM policies, in the Policy JSON area, write the following content: Fig. 3-7 { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iot:Connect" ], "Resource": [ "*" ] }, { "Effect": "Allow", "Action": [ "iot:Publish" ], "Resource": [ "arn:aws:iot:us-east-1:965396684474:topic/$aws/things/RTAWSThing/shadow/update", "arn:aws:iot:us-east-1:965396684474:topic/$aws/things/RTAWSThing/shadow/get" ] }, { "Effect": "Allow", "Action": [ "iot:Subscribe", "iot:Receive" ], "Resource": [ "*" ] } ] }‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Please note, in the JSON content: "arn:aws:iot:<REGION>:<ACCOUNT ID>:topic/$aws/things/<THING NAME>/shadow/update", "arn:aws:iot:<REGION>:<ACCOUNT ID>:topic/$aws/things/<THING NAME>/shadow/get" Region：the us-east-1 inFig. 3-5 ACCOUNT ID, it can be found in the upper right corner my account side. Fig 3-8 Fig 3-9 After finished the IAM policy creation, then back to IAM policies page, choose Filter policies as customer managed, we can find the new created customer’s policy. Fig. 3-10 3.3 Attach policy for the cognito role in IAM In IAM, choose roles item: Fig. 3-11 Double click the cognito_PoolNameUnauth_Role which is generated when creating the pool in cognito, click attach policies, select the new created policy. Fig. 3-12 Fig. 3-13 Until now, we already finish the AWS cognito configuration. 3.4 Android properties file configuration Create a file with .properties, the content is: customer_specific_endpoint=<REST API ENDPOINT> cognito_pool_id=<COGNITO POOL ID> thing_name=<THING NAME> region=<REGION> Please fill the correct content: REST API ENDPOINT：Fig 2-14 COGNITO POOL ID：fig 3-5 THING NAME：fig 2-14，upper left corner REGION：Fig 3-5, the region data in COGNITO POOL ID Take an example, my properties file content is: customer_specific_endpoint=a215vehc5uw107-ats.iot.us-east-1.amazonaws.com cognito_pool_id=us-east-1:c5ca6d11-f069-416c-81f9-fc1ec8fd8de5 thing_name=RTAWSThing region=us-east-1 In the real usage, please use your own configured data, otherwise, it will connect to my cloud endpoint. 4. MQTTfx configuration and testing MQTT.fx is an MQTT client tool which is based on EclipsePaho and written in Java language. It supports subscribe and publish of messages through Topic. You can download this tool from the following link: http://mqttfx.jensd.de/index.php/download The new version is:1.7.1. 4.1 MQTT.fx configuration Choose connect configuration button, then enter the connection configuration page: Fig. 4-1 Profile Name: Enter the configuration name Broker Address: it is REST API ENDPOINT。 Broker Port:8883 Client ID: generate it freely CA file: it is the downloaded CA certificate file Client Certificate File: related certificate file Client key File: private key file Check PEM formatted。 Click apply and OK to finish the configuration. 4.2 Use the AWS cloud to test connection In order to test whether it can be connected to the event cloud, a preliminary connection test can be performed. Open the aws page： https://console.aws.amazon.com/iot here is a Test button under this interface, which can be tested by other clients or by itself.Both AWS cloud and MQTTfx subscribe topic: $aws/things/RTAWSThing/shadow/update MQTTfx publishes data to the topic: $aws/things/RTAWSThing/shadow/update It can be found that both the cloud test port and the MQTTfx subscribe can receive data: Fig. 4-2 Below, the Publish data is tested by the cloud, and then you can see that both the MQTTFX subscribe and the cloud subscribe can receive data: Fig. 4-3 Until now, the AWS cloud can transfer the data between the AWS iot cloud and the client. 5 RT1060 and wifi module configuration We mainly use the RT1060 SDK2.8.0 remote_control_wifi_nxp as the RT test code: SDK_2.8.0_EVK-MIMXRT1060\boards\evkmimxrt1060\aws_examples\remote_control_wifi_nxp Test platform is:MIMXRT1060-EVK Panasonic PAN9026 SDIO ADAPTER + SD to uSD adapter The project is using Panasonic PAN9026 SDIO ADAPTER in default. 5.1 WIFI and the AWS code configuration The project need the working WIFI SSID and the password, so prepare a working WIFI for it. Then add the SSID and the password in the aws_clientcredential.h #define clientcredentialWIFI_SSID "Paste WiFi SSID here." #define clientcredentialWIFI_PASSWORD "Paste WiFi password here." The connection for AWS also in file: aws_clientcredential.h #define clientcredentialMQTT_BROKER_ENDPOINT "a215vehc5uw107-ats.iot.us-east-1.amazonaws.com" #define clientcredentialIOT_THING_NAME "RTAWSThing" #define clientcredentialMQTT_BROKER_PORT 8883 5.2 certificate and the key configuration Open the SDK following link: SDK_2.8.0_EVK-MIMXRT1060\rtos\freertos\tools\certificate_configuration\CertificateConfigurator.html Fig. 5-1 Generate the new aws_clientcredential_keys.h, and replace the old one. Take the MCUXPresso IDE project as an example, the file location is: Fig. 5-2 Build the project and download it to the MIMXRT1060-EVK board. 6 Test result Androd mobile phone download and install the APK under this folder: SDK_2.8.0_EVK-MIMXRT1060\boards\evkmimxrt1060\aws_examples\remote_control_android\AwsRemoteControl.apk SDK can be downloaded from this link: Welcome | MCUXpresso SDK Builder Then, we can use the Android app to remote control the RT EVK on board LED, the test result is 6.1 APP and EVK test result MIMXRT1060-EVK printf information： Fig. 6-1 Turn on and turn off the led: Fig. 6-2 Fig. 6-3 6.2 MQTTfx subscribe result MQTTfx subscribe data Turn on the led, we can subscribe two messages: Fig. 6-4 Fig. 6-5 Turn off the led, we also can subscribe two messages: Fig. 6-6 Fig. 6-7 In the two message, the first one is used to set the led status. The second one is the EVK used to report the EVK led information. MQTTfx also can use the publish page, publish this data: {"state":{"desired":{"LEDstate":1}}} or {"state":{"desired":{"LEDstate":0}}} To topic: $aws/things/RTAWSThing/shadow/update It also can realize the on board LED turn on or off. 6.3 AWS cloud shadows display result Turn on the led: Fig. 6-8 Turn off the led: Fig. 6-9 In conclusion, after the above configuration and testing, it can finish the Android mobile phone to remote control the RT EVK on board LED and get the information. Also can use the MQTTFX client tool and the AWS shadow page to check the communication data.

In the SDK_2.7.0_EVKB-IMXRT1050, it contains some eIQ machine learning demo projects, there's the tensorflow_lite_kws among them. It's a keyword spotting example that is based on Keyword spotting for Microcontrollers and it deploys a deepwise separable convolutional neural network called MobileNet in this demo project. It can classify a one-second audio clip as either silence, an unknown word, "yes", "no", "up", "down", "left", "right", "on", "off", "stop", or "go". Figure 1 shows the components that comprise it. Fig 1 Training Our New Model The model we are using is trained with the TensorFlow script which is designed to demonstrate how to build and train a model for audio recognition using TensorFlow. The script makes it very easy to train an audio recognition model. Among other things, it allows us to do the following: Download a dataset with audio featuring 20 spoken words. Choose which subset of words to train the model on. Specify what type of preprocessing to use on the audio. Choose from several different types of the model architecture. Optimize the model for microcontrollers using quantization. When we run the script, it downloads the dataset, trains a model, and outputs a file representing the trained model. We then use some other tools to convert this file into the correct form for TensorFlow Lite. Training in virtual machine (VM) Preparation Make sure the TensorFlow has been installed, and since the script downloads over 2GB of training data, it'll need a good internet connection and enough free space on the machine. Note that: The training process itself can take several hours, be patient. Training To begin the training process, use the following commands to clone ML-KWS-for-MCU. git clone https://github.com/ARM-software/ML-KWS-for-MCU.git‍‍‍‍‍‍ The training scripts are configured via a bunch of command-line flags that control everything from the model’s architecture to the words it will be trained to classify. The following command runs the script that begins training. You can see that it has a lot of command-line arguments: python ML-KWS-for-MCU/train.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --wanted_words=zero, one, two, three, four, five, six, seven, eight, nine \ --dct_coefficient_count 10 --window_size_ms 40 \ --window_stride_ms 20 --learning_rate 0.0005,0.0001,0.00002 \ --how_many_training_steps 10000,10000,10000 \ --data_dir=./speech_dataset --summaries_dir ./retrain_logs --train_dir ./speech_commands_train ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Some of these, like --wanted_words=zero, one, two, three, four, five, six, seven, eight, nine. By default, the selected words are yes, no, up, down, left, right, on, off, stop, go, but we can provide any combination of the following words, all of which appear in our dataset: Common commands: yes, no, up, down, left, right, on, off, stop, go, backward, forward, follow, learn Digits zero through nine: zero, one, two, three, four, five, six, seven, eight, nine Random words: bed, bird, cat, dog, happy, house, Marvin, Sheila, tree, wow Others set up the output of the script, such as --train_dir=/content/speech_commands_train, which defines where the trained model will be saved. Leave the arguments as they are, and run it. The script will start off by downloading the Speech Commands dataset (Figure 2), which consists of over 105,000 WAVE audio files of people saying thirty different words. This data was collected by Google and released under a CC BY license, and you can help improve it by contributing five minutes of your own voice. The archive is over 2GB, so this part may take a while, but you should see progress logs, and once it's been downloaded once you won't need to do this step again. You can find more information about this dataset in this Speech Commands paper. Fig 2 Once the downloading has completed, some more output will appear. There might be some warnings, which you can ignore as long as the command continues running. Later, you'll see logging information that looks like this (Figure 3). Fig 3 This shows that the initialization process is done and the training loop has begun. You'll see that it outputs information for every training step. Here's a break down of what it means: Step shows that we're on the step of the training loop. In this case, there are going to be 30,000 steps in total, so you can look at the step number to get an idea of how close it is to finishing. rate is the learning rate that's controlling the speed of the network's weight updates. Early on this is a comparatively high number (0.0005), but for later training cycles it will be reduced 5x, to 0.0001, then to 0.00002 at last. accuracy is how many classes were correctly predicted on this training step. This value will often fluctuate a lot, but should increase on average as training progresses. The model outputs an array of numbers, one for each label, and each number is the predicted likelihood of the input being that class. The predicted label is picked by choosing the entry with the highest score. The scores are always between zero and one, with higher values representing more confidence in the result. cross-entropy is the result of the loss function that we're using to guide the training process. This is a score that's obtained by comparing the vector of scores from the current training run to the correct labels, and this should trend downwards during training. checkpoint After a hundred steps, you should see a line like this: This is saving out the current trained weights to a checkpoint file (Figure 4). If your training script gets interrupted, you can look for the last saved checkpoint and then restart the script with --start_checkpoint=/tmp/speech_commands_train/best/ds_cnn_xxxx.ckpt-400 as a command line argument to start from that point . Fig 4 Confusion Matrix After four hundred steps, this information will be logged: The first section is a confusion matrix. To understand what it means, you first need to know the labels being used, which in this case are "silence", "unknown", "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", and "nine". Each column represents a set of samples that were predicted to be each label, so the first column represents all the clips that were predicted to be silence, the second all those that were predicted to be unknown words, the third "zero", and so on. Each row represents clips by their correct, ground truth labels. The first row is all the clips that were silence, the second clips that were unknown words, the third "zero", etc. This matrix can be more useful than just a single accuracy score because it gives a good summary of what mistakes the network is making. In this example you can see that all of the entries in the first row are zero (Figure 5), apart from the initial one. Because the first row is all the clips that are actually silence, this means that none of them were mistakenly labeled as words, so we have no false negatives for silence. This shows the network is already getting pretty good at distinguishing silence from words. If we look down the first column though, we see a lot of non-zero values. The column represents all the clips that were predicted to be silence, so positive numbers outside of the first cell are errors. This means that some clips of real spoken words are actually being predicted to be silence, so we do have quite a few false positives. A perfect model would produce a confusion matrix where all of the entries were zero apart from a diagonal line through the center. Spotting deviations from that pattern can help you figure out how the model is most easily confused, and once you've identified the problems you can address them by adding more data or cleaning up categories. Fig 5 Validation After the confusion matrix, you should see a line like Figure 5 shows. It's good practice to separate your data set into three categories. The largest (in this case roughly 80% of the data) is used for training the network, a smaller set (10% here, known as "validation") is reserved for evaluation of the accuracy during training, and another set (the last 10%, "testing") is used to evaluate the accuracy once after the training is complete. The reason for this split is that there's always a danger that networks will start memorizing their inputs during training. By keeping the validation set separate, you can ensure that the model works with data it's never seen before. The testing set is an additional safeguard to make sure that you haven't just been tweaking your model in a way that happens to work for both the training and validation sets, but not a broader range of inputs. The training script automatically separates the data set into these three categories, and the logging line above shows the accuracy of model when run on the validation set. Ideally, this should stick fairly close to the training accuracy. If the training accuracy increases but the validation doesn't, that's a sign that overfitting is occurring, and your model is only learning things about the training clips, not broader patterns that generalize Training Finished In general, training is the process of iteratively tweaking a model’s weights and biases until it produces useful predictions. The training script writes these weights and biases to checkpoint files (Figure 6). Fig 6 A TensorFlow model consists of two main things: The weights and biases resulting from training A graph of operations that combine the model’s input with these weights and biases to produce the model’s output At this juncture, our model’s operations are defined in the Python scripts, and its trained weights and biases are in the most recent checkpoint file. We need to unite the two into a single model file with a specific format, which we can use to run inference. The process of creating this model file is called freezing—we’re creating a static representation of the graph with the weights frozen into it. To freeze our model, we run a script that is called as follows: python ML-KWS-for-MCU/freeze.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --wanted_words=zero, one, two, three, four, five, six, seven, eight, nine \ --dct_coefficient_count 10 --window_size_ms 40 \ --window_stride_ms 20 --checkpoint ./speech_commands_train/best/ds_cnn_9490.ckpt-21600 \ --output_file=./ds_cnn.pb‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ To point the script toward the correct graph of operations to freeze, we pass some of the same arguments we used in training. We also pass a path to the final checkpoint file, which is the one whose filename ends with the total number of training steps. The frozen graph will be output to a file named ds_cnn.pb. This file is the fully trained TensorFlow model. It can be loaded by TensorFlow and used to run inference. That’s great, but it’s still in the format used by regular TensorFlow, not TensorFlow Lite. Convert to TensorFlow Lite Conversion is a easy step: we just need to run a single command. Now that we have a frozen graph file to work with, we’ll be using toco, the command-line interface for the TensorFlow Lite converter. toco --graph_def_file=./ds_cnn.pb --output_file=./ds_cnn.tflite \ --input_shapes=1,49,10,1 --input_arrays=Reshape_1 --output_arrays='labels_softmax' \ --inference_type=QUANTIZED_UINT8 --mean_values=227 --std_dev_values=1 \ --change_concat_input_ranges=false \ --default_ranges_min=-6 --default_ranges_max =6‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ In the arguments, we specify the model that we want to convert, the output location for the TensorFlow Lite model file, and some other values that depend on the model architecture. we also provide some arguments (inference_type, mean_values, and std_dev_values) that instruct the converter how to map its low-precision values into real numbers. The converted model will be written to ds_cnn.tflite, this a fully formed TensorFlow Lite model! Create a C array We’ll use the xxd command to convert a TensorFlow Lite model into a C array in the following. xxd -i ./ds_cnn.tflite > ./ds_cnn.h cat ./ds_cnn.h‍‍‍‍‍‍‍‍ The final part of the output is the file’s contents, which are a C array and an integer holding its length, as follows: Fig 7 Next, we’ll integrate this newly trained model with the tensorflow_lite_kws project. Using the Model in tensorflow_lite_kws Project To use the new model, we need to do two things: In source/ds_cnn_s_model.h, replace the original model data with our new model. Update the label names in source/kws.cpp with our new ''zero'', ''one'', ''two'', ''three'', ''four'', ''five'', ''six'', ''seven'', ''eight'' and ''nine'' labels. const std::string labels[] = {"Silence", "Unknown","zero", "one", "two", "three","four", "five", "six", "seven","eight", "nine"};‍‍‍ Before running the model in the EVKB-IMXRT1050 board (Figure 8), please refer to the readme.txt to do the preparation, in further, the file also demonstrates the steps of testing, please follow them. Fig 8 Figure 9 shows the testing I did, I've attached the model file, please give a try by yourself. Fig 9

Overview ======== The LPUART example for FreeRTOS demonstrates the possibility to use the LPUART driver in the RTOS with hardware flow control. The example uses two instances of LPUART IP and sends data between them. The UART signals must be jumpered together on the board. Toolchain supported =================== - MCUXpresso 11.0.0 Hardware requirements ===================== - Mini/micro USB cable - MIMXRT1050-EVKB board - Personal Computer Board settings ============== R278 and R279 must be populated, or have pads shorted. These resistors are under the display opposite side of board from uSD connector. The following pins need to be jumpered together: --------------------------------------------------------------------------------- | | UART3 (UARTA) | UART8 (UARTB) | |---|-------------------------------------|-------------------------------------| | # | Signal | Function | Jumper | Jumper | Function | Signal | |---|---------------|----------|----------|----------|----------|---------------| | 1 | GPIO_AD_B1_07 | RX | J22-pin1 | J23-pin1 | TX | GPIO_AD_B1_10 | | 2 | GPIO_AD_B1_06 | TX | J22-pin2 | J23-pin2 | RX | GPIO_AD_B1_11 | | 3 | GPIO_AD_B1_04 | CTS | J23-pin3 | J24-pin5 | RTS | GPIO_SD_B0_03 | | 4 | GPIO_AD_B1_05 | RTS | J23-pin4 | J24-pin4 | CTS | GPIO_SD_B0_02 | --------------------------------------------------------------------------------- Prepare the Demo ================ 1. Connect a USB cable between the host PC and the OpenSDA USB port on the target board. 2. Open a serial terminal with the following settings: - 115200 baud rate - 8 data bits - No parity - One stop bit - No flow control 3. Download the program to the target board. 4. Either press the reset button on your board or launch the debugger in your IDE to begin running the demo. Running the demo ================ You will see status of the example printed to the console. Customization options =====================

配置RT600开发环境 RT600开发入门培训视频。 https://www.nxp.com/document/guide/getting-started-with-i-mx-rt600-evaluation-kit:GS-MIMXRT685-EVK?&tid=vanGS-MIMXRT685-EVK#title2.1 下载I.MX RT600 SDK。下载链接: https://mcuxpresso.nxp.com/en/select?device=EVK-MIMXRT685 下载MCUXpresso IDE。注意需要安装MCUXpresso IDE 11.1.1及最新版本。https://www.nxp.com/webapp/swlicensing/sso/downloadSoftware.sp?catid=MCUXPRESSO 下载安装LPCScrypt，可以将默认板载的CMSIS-DAP固件升级改为J-LINK。通过J-LINK，可以下载调试HiFi4 DSP固件。下载链接https://www.nxp.com/design/microcontrollers-developer-resources/lpc-microcontroller-utilities/lpcscrypt-v2-1-1:LPCSCRYPT?&tab=Design_Tools_Tab 下载安装J-LINK驱动。下载链接https://www.segger.com/downloads/jlink/ 下载安装Cadence HiFi 4 DSP IDE for MIMXRT600。第一次下载，注册用户https://tensilicatools.com/register/。国内用户注册时，如果页面没有出现下面人机身份验证，说明IP被GW Firewall屏蔽了。需要通过代理或者其他特殊手段，否则用户注册将无法成功提交。下载HiFi DSP Development Tools for i.MX RT600开发工具。 https://tensilicatools.com/download/rt600-download-page/ 申请License for RT600 SDK。注意输入绑定网卡MAC地址时，需要去除中间‘：’等字符，否则提示失败。申请成功后，可以下载License文件。启动Xplorer 8.0.13后，在菜单Help -- Xplorer License Keys安装License文件。安装成功后显示如下： Xplorer下载调试器配置。将xt-ocd.exe所在目录加入到系统Path环境变量。使能”Use XOCD Manager”，指定Topology File 设置Download binary为Always，取消每次下载前都弹出提示框，节省下载时间。通过J-Link下载HiFi4 DSP固件，可以单步调试代码。

MCUXPRESSO SECURE PROVISIONING TOOL是官方今年上半年推出的一个针对安全的软件工具，操作起来非常的简单便捷而且稳定可靠，对于安全功能不熟悉的用户十分友好。但就是目前功能还不是很完善，只能支持HAB的相关操作，后续像BEE之类的需等待更新。详细的介绍信息以及用户手册请参考官方网址：MCUXpresso Secure Provisioning Tool | Software Development for NXP Microcontrollers (MCUs) | NXP | NXP 目前似乎知道这个工具的客户还不是很多，大部分用的更多的还是MCU BOOT UTILITY。那么如果已经用了MCU BOOT UTILITY烧录了FUSE，现在想用官方工具了怎么办了？其实对两者进行研究对比后，他们最原始的执行部分都是一样的，所以我们按照如下步骤进行相应的简单替换就能把新工具用起来：首先是crts可keys的替换， MCU BOOT UTILITY的路径是在： ..\NXP-MCUBootUtility-2.2.0\NXP-MCUBootUtility-2.2.0\tools\cst MCUXPRESSO SECURE PROVISIONING的对应路径是在对应workspace的根目录：另外还有一个就是encrypted模式会用到的hab_cert，需要将下面这两个文件对应替换，而且两个工具的命名不同，注意修改。 MCU BOOT UTILITY的路径是在： ..\NXP-MCUBootUtility-2.2.0\NXP-MCUBootUtility-2.2.0\gen\hab_cert MCUXPRESSO SECURE PROVISIONING的路径是workspace里： ..\secure_provisioning_RT1050\gen_hab_certs MCU BOOT UTILITY里命名为：SRK_1_2_3_4_table.bin; SRK_1_2_3_4_fuse.bin MCUXPRESSO SECURE PROVISIONING里命名为：SRK_fuses.bin; SRK_hash.bin 至此，就能够在新工具上用起来了最后提一下，就是这个新工具是可以建不同的workspace来相应存储不同秘钥的项目，能够方便用户区分。在新工具下建的项目也是可以互相替换秘钥的，参考上术步骤中的secure provisioning部分即可。

i.MX RT Crossover MCUs Knowledge Base

i.MX RT Crossover MCUs Knowledge Base

Discussions

Realization of a panoramic video layer with OpenGL

RT10xx SAI basic and SDCard wave file play

iMX RT1170 SDK 2.9.0 release doesn't enable refresh for SDRAM

Using NonCached Memory on i.MXRT

一个基于i.MXRT1050的针对计算机视觉（CV）应用的IoT边缘节点的设计

使用恩智浦微控制器轻松开发GUI

采用TensorFlow Lite模型在RT1060上实现手写体数字识别

在i.MX RT1064 EVK上运行elQ

RT1050 HAB加密映像文件的生成和分析

How to use i.MX RT1060 to classify the clothing

RT's System Clocks

AppWizard: Implementing a Real-Time Clock

RT1050 HAB Encrypted Image Generation and Analysis

How to use the RT1064 on-chip flash as NVM

The “Hello World” of TensorFlow Lite

RT cloud testing for AWS cloud remote control

Recognition model for microcontroller use

SDK UART driver example with hardware flow control

Some tips on how to setup RT600 development environment

MCU BOOT UTILITY和MCUXPRESSO SECURE PROVISIONING之间的秘钥替换