i.MX RT Crossover MCUs Knowledge Base

ディスカッション

The iMX RT1050 ROM will allow you to copy an application image from a serial NOR flash memory on the FlexSPI controller to SDRAM at boot time. If you want to run your application from SDRAM, then when debugging and developing your application you should use an initialization script for the debugger to setup the SDRAM so the application can be downloaded directly to the SDRAM for debugging. When you are ready to have your application boot without the debugger, then you'll need to use the RT flashloader tools to program the application to the flash and configure it to copy to the SDRAM. The attached document contains instructions on how to program a boot image to serial NOR flash (in this case the hyperflash that is on the EVK) that will be copied to the SDRAM at boot time.

The path of SDRAM Clock in Clock Tree According CCM clock tree in i.MXRT1050 reference manual, we can abstract part of SDRAM clock, and draw it’s diagram below. Descriptions for Diagram 1 (1) PLL2 PFD2 ① Registers related to PLL2 PFD2 ---CCM_ANALOG_PLL_SYSn (page 767, in reference manual) Address: 0x400D_8030h important bits: bit[15:14]---- select clock source. Bit[13] ----- Enable PLL output Bit[0]------- This field controls the PLL loop divider. 0 - Fout=Fref*20; 1 - Fout=Fref*22. ---CCM_ANALOG_PLL_SYS_NUM(page 768, in reference manual) Address: 0x400D_8050h important bits: bit[29:0]--- 30 bit numerator (A) of fractional loop divider (signed integer) ---CCM_ANALOG_PLL_SYS_DENOM (page 769, in reference manual) Address: 0x400D_8060h important bits: bit[29:0]---- 30 bit Denominator (B) of fractional loop divider (unsigned integer). ---CCM_ANALOG_PFD_528n (page 769, in reference manual) Address: 0x400D_8100h important bits: bit[21:16]----- This field controls the fractional divide value. The resulting frequency shall be 528*18/PFD2_FRAC where PFD2_FRAC is in the range 12-35. ② Computational formula PLL2_PFD2_OUT=(External 24MHz)*(Fout + A/B) * 18/ PFD2_FRAC ③ Example for PLL2_PFD2_OUT computation CCM_ANALOG_PLL_SYSn[0] = 1 // Fout=Fref*22 CCM_ANALOG_PLL_SYS_NUM[29:0] = 56 // A = 56 CCM_ANALOG_PLL_SYS_DENOM[29:0] = 256 // B=256 CCM_ANALOG_PFD_528n[21:16] = 29 // PFD2_FRAC=29 PLL2_PFD2_OUT = 24 * (22 + 56/256)*18/29 = 331MHz (330.98MHz) (2) Clock Select Register : CCM_CBCDR Address: 0x 400F_C014h important bits: SEMC_ALT_CLK_SEL & SEMC_CLK_SEL & SEMC_PODF bit[7] --- bit[SEMC_ALT_CLK_SEL] 0---PLL2 PFD2 will be selected as alternative clock for SEMC root clock 1---PLL3 PFD1 will be selected as alternative clock for SEMC root clock Bit[6] --- bit[SEMC_CLK_SEL] 0----Periph_clk output will be used as SEMC clock root 1----SEMC alternative clock will be used as SEMC clock root Bit[18:16] --- bit[SEMC_PODF] Post divider for SEMC clock. NOTE: Any change of this divider might involve handshake with EMI. See CDHIPR register for the handshake busy bits. 000 divide by 1 001 divide by 2 010 divide by 3 011 divide by 4 100 divide by 5 101 divide by 6 110 divide by 7 111 divide by 8 Example for configuration of SDRAM Clock Example : 166MHz SDRAM Clock ---- 0x400D8030 = 0x00002001 // wirte 0x00002001 to CCM_ANALOG_PLL_SYSn ---- 0x400D8050 = 0x00000038 // write 0x00000038 to CCM_ANALOG_PLL_SYS_NUM ---- 0x400D8060 = 0x00000100 // write 0x00000100 to CCM_ANALOG_PLL_SYS_DENOM ---- 0x400D8100 = 0x001d0000 // write 0x001d0000 to CCM_ANALOG_PFD_528n ---- 0x400FC014 = 0x00010D40 // write 0x00010D40 to CCM_CBCDR, divided by 2 NXP TIC team Weidong Sun 2018-06-01

In the SDK_2.7.0_EVKB-IMXRT1050, it contains some eIQ machine learning demo projects, there's the tensorflow_lite_kws among them. It's a keyword spotting example that is based on Keyword spotting for Microcontrollers and it deploys a deepwise separable convolutional neural network called MobileNet in this demo project. It can classify a one-second audio clip as either silence, an unknown word, "yes", "no", "up", "down", "left", "right", "on", "off", "stop", or "go". Figure 1 shows the components that comprise it. Fig 1 Training Our New Model The model we are using is trained with the TensorFlow script which is designed to demonstrate how to build and train a model for audio recognition using TensorFlow. The script makes it very easy to train an audio recognition model. Among other things, it allows us to do the following: Download a dataset with audio featuring 20 spoken words. Choose which subset of words to train the model on. Specify what type of preprocessing to use on the audio. Choose from several different types of the model architecture. Optimize the model for microcontrollers using quantization. When we run the script, it downloads the dataset, trains a model, and outputs a file representing the trained model. We then use some other tools to convert this file into the correct form for TensorFlow Lite. Training in virtual machine (VM) Preparation Make sure the TensorFlow has been installed, and since the script downloads over 2GB of training data, it'll need a good internet connection and enough free space on the machine. Note that: The training process itself can take several hours, be patient. Training To begin the training process, use the following commands to clone ML-KWS-for-MCU. git clone https://github.com/ARM-software/ML-KWS-for-MCU.git‍‍‍‍‍‍ The training scripts are configured via a bunch of command-line flags that control everything from the model’s architecture to the words it will be trained to classify. The following command runs the script that begins training. You can see that it has a lot of command-line arguments: python ML-KWS-for-MCU/train.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --wanted_words=zero, one, two, three, four, five, six, seven, eight, nine \ --dct_coefficient_count 10 --window_size_ms 40 \ --window_stride_ms 20 --learning_rate 0.0005,0.0001,0.00002 \ --how_many_training_steps 10000,10000,10000 \ --data_dir=./speech_dataset --summaries_dir ./retrain_logs --train_dir ./speech_commands_train ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Some of these, like --wanted_words=zero, one, two, three, four, five, six, seven, eight, nine. By default, the selected words are yes, no, up, down, left, right, on, off, stop, go, but we can provide any combination of the following words, all of which appear in our dataset: Common commands: yes, no, up, down, left, right, on, off, stop, go, backward, forward, follow, learn Digits zero through nine: zero, one, two, three, four, five, six, seven, eight, nine Random words: bed, bird, cat, dog, happy, house, Marvin, Sheila, tree, wow Others set up the output of the script, such as --train_dir=/content/speech_commands_train, which defines where the trained model will be saved. Leave the arguments as they are, and run it. The script will start off by downloading the Speech Commands dataset (Figure 2), which consists of over 105,000 WAVE audio files of people saying thirty different words. This data was collected by Google and released under a CC BY license, and you can help improve it by contributing five minutes of your own voice. The archive is over 2GB, so this part may take a while, but you should see progress logs, and once it's been downloaded once you won't need to do this step again. You can find more information about this dataset in this Speech Commands paper. Fig 2 Once the downloading has completed, some more output will appear. There might be some warnings, which you can ignore as long as the command continues running. Later, you'll see logging information that looks like this (Figure 3). Fig 3 This shows that the initialization process is done and the training loop has begun. You'll see that it outputs information for every training step. Here's a break down of what it means: Step shows that we're on the step of the training loop. In this case, there are going to be 30,000 steps in total, so you can look at the step number to get an idea of how close it is to finishing. rate is the learning rate that's controlling the speed of the network's weight updates. Early on this is a comparatively high number (0.0005), but for later training cycles it will be reduced 5x, to 0.0001, then to 0.00002 at last. accuracy is how many classes were correctly predicted on this training step. This value will often fluctuate a lot, but should increase on average as training progresses. The model outputs an array of numbers, one for each label, and each number is the predicted likelihood of the input being that class. The predicted label is picked by choosing the entry with the highest score. The scores are always between zero and one, with higher values representing more confidence in the result. cross-entropy is the result of the loss function that we're using to guide the training process. This is a score that's obtained by comparing the vector of scores from the current training run to the correct labels, and this should trend downwards during training. checkpoint After a hundred steps, you should see a line like this: This is saving out the current trained weights to a checkpoint file (Figure 4). If your training script gets interrupted, you can look for the last saved checkpoint and then restart the script with --start_checkpoint=/tmp/speech_commands_train/best/ds_cnn_xxxx.ckpt-400 as a command line argument to start from that point . Fig 4 Confusion Matrix After four hundred steps, this information will be logged: The first section is a confusion matrix. To understand what it means, you first need to know the labels being used, which in this case are "silence", "unknown", "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", and "nine". Each column represents a set of samples that were predicted to be each label, so the first column represents all the clips that were predicted to be silence, the second all those that were predicted to be unknown words, the third "zero", and so on. Each row represents clips by their correct, ground truth labels. The first row is all the clips that were silence, the second clips that were unknown words, the third "zero", etc. This matrix can be more useful than just a single accuracy score because it gives a good summary of what mistakes the network is making. In this example you can see that all of the entries in the first row are zero (Figure 5), apart from the initial one. Because the first row is all the clips that are actually silence, this means that none of them were mistakenly labeled as words, so we have no false negatives for silence. This shows the network is already getting pretty good at distinguishing silence from words. If we look down the first column though, we see a lot of non-zero values. The column represents all the clips that were predicted to be silence, so positive numbers outside of the first cell are errors. This means that some clips of real spoken words are actually being predicted to be silence, so we do have quite a few false positives. A perfect model would produce a confusion matrix where all of the entries were zero apart from a diagonal line through the center. Spotting deviations from that pattern can help you figure out how the model is most easily confused, and once you've identified the problems you can address them by adding more data or cleaning up categories. Fig 5 Validation After the confusion matrix, you should see a line like Figure 5 shows. It's good practice to separate your data set into three categories. The largest (in this case roughly 80% of the data) is used for training the network, a smaller set (10% here, known as "validation") is reserved for evaluation of the accuracy during training, and another set (the last 10%, "testing") is used to evaluate the accuracy once after the training is complete. The reason for this split is that there's always a danger that networks will start memorizing their inputs during training. By keeping the validation set separate, you can ensure that the model works with data it's never seen before. The testing set is an additional safeguard to make sure that you haven't just been tweaking your model in a way that happens to work for both the training and validation sets, but not a broader range of inputs. The training script automatically separates the data set into these three categories, and the logging line above shows the accuracy of model when run on the validation set. Ideally, this should stick fairly close to the training accuracy. If the training accuracy increases but the validation doesn't, that's a sign that overfitting is occurring, and your model is only learning things about the training clips, not broader patterns that generalize Training Finished In general, training is the process of iteratively tweaking a model’s weights and biases until it produces useful predictions. The training script writes these weights and biases to checkpoint files (Figure 6). Fig 6 A TensorFlow model consists of two main things: The weights and biases resulting from training A graph of operations that combine the model’s input with these weights and biases to produce the model’s output At this juncture, our model’s operations are defined in the Python scripts, and its trained weights and biases are in the most recent checkpoint file. We need to unite the two into a single model file with a specific format, which we can use to run inference. The process of creating this model file is called freezing—we’re creating a static representation of the graph with the weights frozen into it. To freeze our model, we run a script that is called as follows: python ML-KWS-for-MCU/freeze.py --model_architecture ds_cnn --model_size_info 5 64 10 4 2 2 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 64 3 3 1 1 \ --wanted_words=zero, one, two, three, four, five, six, seven, eight, nine \ --dct_coefficient_count 10 --window_size_ms 40 \ --window_stride_ms 20 --checkpoint ./speech_commands_train/best/ds_cnn_9490.ckpt-21600 \ --output_file=./ds_cnn.pb‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ To point the script toward the correct graph of operations to freeze, we pass some of the same arguments we used in training. We also pass a path to the final checkpoint file, which is the one whose filename ends with the total number of training steps. The frozen graph will be output to a file named ds_cnn.pb. This file is the fully trained TensorFlow model. It can be loaded by TensorFlow and used to run inference. That’s great, but it’s still in the format used by regular TensorFlow, not TensorFlow Lite. Convert to TensorFlow Lite Conversion is a easy step: we just need to run a single command. Now that we have a frozen graph file to work with, we’ll be using toco, the command-line interface for the TensorFlow Lite converter. toco --graph_def_file=./ds_cnn.pb --output_file=./ds_cnn.tflite \ --input_shapes=1,49,10,1 --input_arrays=Reshape_1 --output_arrays='labels_softmax' \ --inference_type=QUANTIZED_UINT8 --mean_values=227 --std_dev_values=1 \ --change_concat_input_ranges=false \ --default_ranges_min=-6 --default_ranges_max =6‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ In the arguments, we specify the model that we want to convert, the output location for the TensorFlow Lite model file, and some other values that depend on the model architecture. we also provide some arguments (inference_type, mean_values, and std_dev_values) that instruct the converter how to map its low-precision values into real numbers. The converted model will be written to ds_cnn.tflite, this a fully formed TensorFlow Lite model! Create a C array We’ll use the xxd command to convert a TensorFlow Lite model into a C array in the following. xxd -i ./ds_cnn.tflite > ./ds_cnn.h cat ./ds_cnn.h‍‍‍‍‍‍‍‍ The final part of the output is the file’s contents, which are a C array and an integer holding its length, as follows: Fig 7 Next, we’ll integrate this newly trained model with the tensorflow_lite_kws project. Using the Model in tensorflow_lite_kws Project To use the new model, we need to do two things: In source/ds_cnn_s_model.h, replace the original model data with our new model. Update the label names in source/kws.cpp with our new ''zero'', ''one'', ''two'', ''three'', ''four'', ''five'', ''six'', ''seven'', ''eight'' and ''nine'' labels. const std::string labels[] = {"Silence", "Unknown","zero", "one", "two", "three","four", "five", "six", "seven","eight", "nine"};‍‍‍ Before running the model in the EVKB-IMXRT1050 board (Figure 8), please refer to the readme.txt to do the preparation, in further, the file also demonstrates the steps of testing, please follow them. Fig 8 Figure 9 shows the testing I did, I've attached the model file, please give a try by yourself. Fig 9

Overview ======== The LPUART example for FreeRTOS demonstrates the possibility to use the LPUART driver in the RTOS with hardware flow control. The example uses two instances of LPUART IP and sends data between them. The UART signals must be jumpered together on the board. Toolchain supported =================== - MCUXpresso 11.0.0 Hardware requirements ===================== - Mini/micro USB cable - MIMXRT1050-EVKB board - Personal Computer Board settings ============== R278 and R279 must be populated, or have pads shorted. These resistors are under the display opposite side of board from uSD connector. The following pins need to be jumpered together: --------------------------------------------------------------------------------- | | UART3 (UARTA) | UART8 (UARTB) | |---|-------------------------------------|-------------------------------------| | # | Signal | Function | Jumper | Jumper | Function | Signal | |---|---------------|----------|----------|----------|----------|---------------| | 1 | GPIO_AD_B1_07 | RX | J22-pin1 | J23-pin1 | TX | GPIO_AD_B1_10 | | 2 | GPIO_AD_B1_06 | TX | J22-pin2 | J23-pin2 | RX | GPIO_AD_B1_11 | | 3 | GPIO_AD_B1_04 | CTS | J23-pin3 | J24-pin5 | RTS | GPIO_SD_B0_03 | | 4 | GPIO_AD_B1_05 | RTS | J23-pin4 | J24-pin4 | CTS | GPIO_SD_B0_02 | --------------------------------------------------------------------------------- Prepare the Demo ================ 1. Connect a USB cable between the host PC and the OpenSDA USB port on the target board. 2. Open a serial terminal with the following settings: - 115200 baud rate - 8 data bits - No parity - One stop bit - No flow control 3. Download the program to the target board. 4. Either press the reset button on your board or launch the debugger in your IDE to begin running the demo. Running the demo ================ You will see status of the example printed to the console. Customization options =====================

i.MX RT Crossover MCUs Knowledge Base

i.MX RT Crossover MCUs Knowledge Base

ディスカッション

NXP Tech Session - Bringing Low Power, High Performance Audio and Voice to Market on the i.MX RT600 Crossover MCU

i.MXRT 1050 EVK + MCUXpresso 的快速入门

RT1050 - Booting from serial NOR flash to SDRAM

Application Note (AN12762): Develop Audio Player In HiFi 4 for i.MX RT600 MCU

i.MXRT1050 SDRAM Clock Configuration

Recognition model for microcontroller use

NXP Tech Session - Hands-free Interaction and Customization through Alexa Certified Far-Field Voice Recognition.pdf

SDK UART driver example with hardware flow control