i.MX RT Knowledge Base

Discussions

Face recognition Actually, face recognition technology is used in many scenes in our daily life, for instance, when taking pictures with the mobile phone, the camera software will automatically recognize the faces in the lens and focus, scan face for real-name verification when registering the App and scan face for pay, etc. The basic steps of face recognition are shown in the below figure. Firstly, the camera captures image data, then through preprocessing such as noise elimination and image format conversion, the image data will be transmitted to the processor for face detection and recognition calculations. After recognizing the face successful, continue to do the follow-up operations. Fig1 The basic steps of face recognition i.MX RT106F MCU based solution for face recognition The below figure is the block diagram of i.MX RT106F MCU-based solution for face recognition provided by the NXP. Comparing with the general processor (CPU) solution, it has comparative advantages in cost and power consumption. Further, the PCB size will be smaller too and the MCU usually can boot up within a few hundred milliseconds even with RTOS, versus to the boot-up speed of the processor (CPU) equipped with a Linux system that is about 10 seconds, it will give customers a better user experience. Fig2 i.MX RT106F MCU based solution for face recognition Of course, the i.MX RT106F MCU-based solution face recognition solution is not intended to replace the solution based on the processor (CPU). As aforementioned, face recognition technology has a lot of application cases, and it will definitely be used in more fields in the future, so the MCU-based face recognition solution provides customers and the market with another choice. i.MX RT106F MCU The i.MX RT106F face recognition crossover processor is an EdgeReady™ solution-specific variant of the i.MX RT1060 family of crossover processors, targeting face recognition applications. It features NXP’s advanced implementation of the Arm Cortex®-M7 core, which operates at speeds up to 600 MHz to provide high CPU performance and the best real-time response. i.MX RT106F based solutions enable system designers to easily and inexpensively add face recognition capabilities to a wide variety of smart appliances, smart homes, smart retail, and smart industrial devices. The i.MX RT106F is licensed to run the OASIS Lite library for face recognition (as the below figure shows) which include: Face detection Anti-spoofing Face tracking Face alignment Glass detection Face recognition Confidence measure Face recognition quantified results, etc Fig3 OASIS Recognition Software Pipeline sln_viznas_iot_elock_oobe The sln_viznas_iot_elock_oobe project is the application on the SLN-VIZNAS-IOT (as the below figure shows, regarding the Bootstrap and Bootloader in the software flowchart, I will introduce them in the future). The following development work is based on the sln_viznas_iot_elock_oobe project, however, I need to sketch the basic workflow of it prior to starting real development work. Fig4 SLN-VIZNAS-IOT software flowchart sln_viznas_iot_elock_oobe's workflow flow In the Camera_Start() function, the task (Camera_Init_Task) completes the initialization of the RGB and IR cameras, then creates a task (Camera_Task); In the Display_Start() function, after the task (Display_Init_Task) completes the initialization of the display medium (USB or LCD), it immediately creates the task (Display_Task) and sends the message queue s_DisplayReqMsg.id = QMSG_DISPLAY_FRAME_REQ to the task (Camera_Task), then the pDispData will point to the s_BufferLcd[0] array for storing the image data to be displayed; In the Oasis_Start() function, firstly, OASISLT_init() completes the initialization of the OAISIT library, then creates a task (Oasis_Task) to send the message queues gFaceDetReqMsg.id = QMSG_FACEREC_FRAME_REQ and gFaceInfoMsg.id = QMSG_FACEREC_INFO_UPDATE to the task (Camera_Task) to make the pDetIR and pDetRGB point to the face block diagram captured by the RGB and IR cameras, and update the content pointed by infoMsgIn. After the camera is initialized, the RGB camera works at first. After the image data is captured, an interrupt is triggered and the callback function Camera_Callback() sends the message queue DQMsg.id = QMSG_CAMERA_DQ to the task (Camera_Task), and DQIndex++; CAMERA_RECEIVER_GetFullBuffer() extracts the image data captured by the RGB camera, and sends the message queue DPxpMsg.id = QMSG_PXP_DISPLAY to the task (PXP_Task) created in the APP_PXP_Start() function and EQIndex++, meanwhile switch the camera from RGB to IR. After the APP_PXPStartCamera2Display() function in the task (PXP_Task) completes processing, it sends the message queue s_DResMsg.id = QMSG_PXP_DISPLAY to the task (Camera_Task), and the task (Camera_Task) sends the message queue DresMsg.id = QMSG_DISPLAY_FRAME_RES to the task (Display_Task) after receiving the above message queue. The task (Display_Task) completes display, then it sends the message queue s_DisplayReqMsg.id = QMSG_DISPLAY_FRAME_REQ to the task (Camera_Task) to make pDispData point to the s_BufferLcd[1] array; After the IR camera completes capturing work, CAMERA_RECEIVER_GetFullBuffer() extracts the image data and sends the message queue DPxpMsg.id = QMSG_PXP_DISPLAY to the (PXP_Task) task created in the APP_PXP_Start() function, continue to execute EQIndex++ and switch to RGB camera again, and repeat the steps 5. Finally, send the message queue FPxpMsg.id = QMSG_PXP_FACEREC to the task (PXP_Task) and set irReady = true. After the task (PXP_Task) receives the above message queue, it calls APP_PXPStartCamera2DetBuf() and after completes the processing, sends the message queue s_FResMsg.id = QMSG_PXP_FACEREC to the task (Camera_Task); CAMERA_RECEIVER_GetFullBuffer() extracts the image data collected by the RGB camera, repeat step 5, when (pDetRGB && irReady) condition is met, send the message queue FPxpMsg.id = QMSG_PXP_FACEREC to the task (PXP_Task) and set irReady = false, pDetRGB = NULL, pDetIR = NULL. After the task (PXP_Task) receives the above message queue, it calls APP_PXPStartCamera2DetBuf() and after completes the processing, sends the message queue s_FResMsg.id = QMSG_PXP_FACEREC to the task (Camera_Task). At this time, the (!pDetIR && !pDetRGB) condition is met and the Queue message FResMsg.id = QMSG_FACEREC_FRAME_RES is sent to the task (Oasis_Task), run OASISLT_run_extend to perform face recognition calculation, and send the message queue gFaceDetReqMsg.id = QMSG_FACEREC_FRAME_REQ to the task (Camera_Task) to make the pDetIR and pDetRGB point to the face block diagram captured by the RGB and IR cameras again. keep repeat steps 6 and 7; Fig5 sln_viznas_iot_elock_oobe's workflow flow Smart Coffee machine Fig 6 is the workflow of the smart coffee machine that I want to develop for, as there is no LCD board on hand, in the below development process, I will select Win10's camera (as the below figure shows) to output the captured image, further, take advantage of the Shell command to simulate the LCD's touch feature to interact with the board. Fig6 workflow of the smart coffee machine Fig7 Camera Code modification In the commondef.h, add a new member variable 'uint16_t coffee_taste' in Union FeatureItem to stand for the favorite coffee taste; typedef union { struct { /*put char/unsigned char together to avoid padding*/ unsigned char magic; char name[FEATUREDATA_NAME_MAX_LEN]; int index; // this id identify a feature uniquely,we should use it as a handler for feature add/del/update/rename uint16_t id; uint16_t pad; // Add a new component uint16_t coffee_taste; /*put feature in the last so, we can take it as dynamic, size limitation: * (FEATUREDATA_FLASH_PAGE_SIZE * 2 - 1 - FEATUREDATA_NAME_MAX_LEN - 4 - 4 -2)/4*/ float feature[0]; }; unsigned char raw[FEATUREDATA_FLASH_PAGE_SIZE * 2]; } FeatureItem; // 1kB In featuredb.h, add two member functions into class FeatureDB: set_taste() and get_taste() , and add the definition of the above two member functions in featuredb.cpp; class FeatureDB { public: FeatureDB(); ~FeatureDB(); int add_feature(uint16_t id, const std::string name, float *feature); int update_feature(uint16_t id, const std::string name, float *feature); int del_feature(uint16_t id, std::string name); int del_feature(const std::string name); int del_feature_all(); std::vector<std::string> get_names(); int get_name(uint16_t id, std::string &name); std::vector<uint16_t> get_ids(); int ren_name(const std::string oldname, const std::string newname); int feature_count(); int get_free(int &index); int database_save(int count); int get_feature(uint16_t id, float *feature); void set_autosave(bool auto_save); bool get_autosave(); //Add two customize member functions int set_taste(const std::string username, uint16_t taste_number); int get_taste(const std::string username); private: bool auto_save; int load_feature(); int erase_feature(int index); int save_feature(int index = 0); int reassign_feature(); int get_free_mapmagic(); int get_remain_map(); }; int FeatureDB::set_taste(const std::string username, uint16_t taste_number) { int index = FEATUREDATA_MAX_COUNT; for (int i = 0; i < FEATUREDATA_MAX_COUNT; i++) { if (s_FeatureData.item[i].magic == FEATUREDATA_MAGIC_VALID) { if (!strcmp(username.c_str(), s_FeatureData.item[i].name)) { index = i; } } } if (index != FEATUREDATA_MAX_COUNT) { s_FeatureData.item[index].coffee_taste = taste_number; return 0; } else { return -1; } } int FeatureDB::get_taste(const std::string username) { int index = FEATUREDATA_MAX_COUNT; int taste_number; for (int i = 0; i < FEATUREDATA_MAX_COUNT; i++) { if (s_FeatureData.item[i].magic == FEATUREDATA_MAGIC_VALID) { if (!strcmp(username.c_str(), s_FeatureData.item[i].name)) { index = i; } } } if (index != FEATUREDATA_MAX_COUNT) { taste_number = s_FeatureData.item[index].coffee_taste; return taste_number; } else { return -1; } } In database.h, add the declarations of DB_Set_Taste() and DB_Get_Taste() functions, and in database.cpp, add the related codes of the above two functions. These two functions are equivalent to encapsulating the newly added member functions set_taste() and get_taste() of the FeatureDB class; int DB_Del(uint16_t id, std::string name); int DB_Del(string name); int DB_DelAll(); int DB_Ren(const std::string oldname, const std::string newname); int DB_GetFree(int &index); int DB_GetNames(std::vector<std::string> *names); int DB_Count(int *count); int DB_Save(int count); int DB_GetFeature(uint16_t id, float *feature); int DB_Add(uint16_t id, float *feature); int DB_Add(uint16_t id, std::string name, float *feature); int DB_Update(uint16_t id, float *feature); int DB_GetIDs(std::vector<uint16_t> &ids); int DB_GetName(uint16_t id, std::string &names); int DB_GenID(uint16_t *id); int DB_SetAutoSave(bool auto_save); // Add two customize functions int DB_Set_Taste(const std::string username, const uint16_t taste); int DB_Get_Taste(const std::string username); int DB_Set_Taste(const std::string username, const uint16_t taste) { int ret = DB_MGMT_FAILED; ret = DB_Lock(); if (DB_MGMT_OK == ret) { ret = s_DB->set_taste(username, taste); DB_UnLock(); } return ret; } int DB_Get_Taste(const std::string username) { int ret = DB_MGMT_FAILED; ret = DB_Lock(); if (DB_MGMT_OK == ret) { ret = s_DB->get_taste(username); DB_UnLock(); } return ret; } In sln_api.h, add the declarations of the functions VIZN_SetTaste() , VIZN_GetTaste() and VIZN_Is_Rec_User() , and add the codes of the above three functions in sln_api.cpp. The VIZN_SetTaste() and VIZN_GetTaste() functions are equivalent to the encapsulation of the DB_Set_Taste() and DB_Get_Taste() functions. Why is it so complicated? To follow the code layering mechanism of the elock_oobe project and reduce the difficulty of code implementation through code layered encapsulation. /** * @brief Set user's favorite coffee taste. * * @Param clientHandle The client handler which required this action * @Param userName Pointer to a buffer which contains the name of the new user. * @Param taste Coffee taste */ vizn_api_status_t VIZN_SetTaste(VIZN_api_client_t *clientHandle, char *UserName, cfg_Coffee_taste taste); /** * @brief Set user's favorite coffee taste. * * @Param clientHandle The client handler which required this action * @Param userName Pointer to a buffer which contains the name of the new user. * @Param taste Pointer to the Coffee taste */ vizn_api_status_t VIZN_GetTaste(VIZN_api_client_t *clientHandle, char *UserName, int *taste); vizn_api_status_t VIZN_Is_Rec_User(VIZN_api_client_t *clientHandle, char *UserName); ~~~~~~~~~ vizn_api_status_t VIZN_SetTaste(VIZN_api_client_t *clientHandle, char *UserName, cfg_Coffee_taste taste) { int32_t status; if (!IsValidUserName(UserName)) { return kStatus_API_Layer_RenameUser_InvalidUserName; } status = DB_Set_Taste(std::string(UserName), (uint16_t)taste); if (status == 0) { return kStatus_API_Layer_Success; } else if (status == -1) { return kStatus_API_Layer_SetTaste_Failed; } } vizn_api_status_t VIZN_GetTaste(VIZN_api_client_t *clientHandle, char *UserName, int *taste) { int32_t status; if (!IsValidUserName(UserName)) { return kStatus_API_Layer_RenameUser_InvalidUserName; } *taste = DB_Get_Taste(std::string(UserName)); if (*taste != -1) { return kStatus_API_Layer_Success; } else { return kStatus_API_Layer_GetTaste_Failed; } } vizn_api_status_t VIZN_Is_Rec_User(VIZN_api_client_t *clientHandle, char *UserName) { if (!IsValidUserName(UserName)) { return kStatus_API_Layer_RenameUser_InvalidUserName; } return kStatus_API_Layer_Success; } In sln_api_init.cpp, declare the variable: std::string Current_User = "" ; which is used to store the name corresponding to the face after recognition, and add the processing function Coffee_Rec() after successful face recognition in the structure variable ops2; std::string Current_User = " "; //Add customize function int Coffee_Rec(VIZN_api_client_t *pClient, face_info_t face_info); client_operations_t ops2 = { .detect = NULL, .recognize = Coffee_Rec,//NULL, .enrolment = NULL, }; //Add customize function int Coffee_Rec(VIZN_api_client_t *pClient, face_info_t face_info) { Current_User = face_info.name; return 1; } In sln_timers.h, increase MS_SYSTEM_LOCKED to extend the locked status time to 25 seconds; ~~~~~~~~ #define MS_SYSTEM_LOCKED 25000 //2000 // MS in which the board is in a locked state after a reg/rec. ~~~~~~~~ In sln_cli.cpp, add three Shell commands: order, set_taste, get_taste to stand for the operations of brewing coffee, setting coffee taste, and checking coffee taste; SHELL_COMMAND_DEFINE(set_taste, (char *)"\r\n\"set_taste username <0|1|2|3|~>\": set user's favorite taste\r\n" "0 - Cappuccino\r\n" "1 - Black Coffee\r\n" "2 - Coffee latte\r\n" "3 - Flat White\r\n" "4 - Cortado\r\n" "5 - Mocha\r\n" "6 - Con Panna\r\n" "7 - Lungo\r\n" "8 - Ristretto\r\n" "9 - Others \r\n", FFI_CLI_SetTasteCommand, SHELL_IGNORE_PARAMETER_COUNT); SHELL_COMMAND_DEFINE(get_taste, (char *)"\r\n\"get_taste username\": return user's favorite taste \r\n", FFI_CLI_GetTasteCommand, SHELL_IGNORE_PARAMETER_COUNT); SHELL_COMMAND_DEFINE(order, (char *)"\r\n\"order <0|1|2|3|~>\": order a favorite taste \r\n", FFI_CLI_OrderCommand, SHELL_IGNORE_PARAMETER_COUNT); ~~~~~~ static shell_status_t FFI_CLI_SetTasteCommand(shell_handle_t shellContextHandle, int32_t argc, char **argv) { if (argc != 3) { SHELL_Printf(shellContextHandle, "Wrong parameters\r\n"); return kStatus_SHELL_Error; } return UsbShell_QueueSendFromISR(shellContextHandle, argc, argv, SHELL_EV_FFI_CLI_SET_TASTE); } static shell_status_t FFI_CLI_GetTasteCommand(shell_handle_t shellContextHandle, int32_t argc, char **argv) { if (argc != 2) { SHELL_Printf(shellContextHandle, "Wrong parameters\r\n"); return kStatus_SHELL_Error; } return UsbShell_QueueSendFromISR(shellContextHandle, argc, argv, SHELL_EV_FFI_CLI_GET_TASTE); } shell_status_t FFI_CLI_OrderCommand(shell_handle_t shellContextHandle, int32_t argc, char **argv) { if (argc > 2) { SHELL_Printf(shellContextHandle, "Wrong parameters\r\n"); return kStatus_SHELL_Error; } return UsbShell_QueueSendFromISR(shellContextHandle, argc, argv, SHELL_EV_FFI_CLI_ORDER); } ~~~~~~ shell_status_t RegisterFFICmds(shell_handle_t shellContextHandle) { SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(list)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(add)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(del)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(rename)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(verbose)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(camera)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(version)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(save)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(updateotw)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(reset)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(emotion)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(liveness)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(detection)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(display)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(wifi)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(app_type)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(low_power)); // Add three Shell commands SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(order)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(set_taste)); SHELL_RegisterCommand(shellContextHandle, SHELL_COMMAND(get_taste)); return kStatus_SHELL_Success; } In sln_cli.cpp, it needs to add corresponding codes for handle order, set_taste, get_taste instructions in task UsbShell_CmdProcess_Task else if (queueMsg.shellCommand == SHELL_EV_FFI_CLI_SET_TASTE) { int coffee_taste = atoi(queueMsg.argv[2]); if (coffee_taste >= Cappuccino && coffee_taste <= Others) { status = VIZN_SetTaste(&VIZN_API_CLIENT(Shell),(char *)queueMsg.argv[1], (cfg_Coffee_taste)coffee_taste); if (status == kStatus_API_Layer_Success) { SHELL_Printf(shellContextHandle, "User: %s like coffee taste: %s \r\n", queueMsg.argv[1], Coffee_type[coffee_taste]); } else { SHELL_Printf(shellContextHandle, "Cannot set coffee taste\r\n"); } } else { SHELL_Printf(shellContextHandle, "Unsupported coffee taste\r\n"); } } else if (queueMsg.shellCommand == SHELL_EV_FFI_CLI_GET_TASTE) { int get_taste_num = 0; status = VIZN_GetTaste(&VIZN_API_CLIENT(Shell),(char *)queueMsg.argv[1], &get_taste_num); if (status == kStatus_API_Layer_Success) { SHELL_Printf(shellContextHandle, "User: %s like coffee taste: %s \r\n", queueMsg.argv[1], Coffee_type[(cfg_Coffee_taste)(get_taste_num)]); } else { SHELL_Printf(shellContextHandle, "Cannot get coffee taste\r\n"); } } else if (queueMsg.shellCommand == SHELL_EV_FFI_CLI_ORDER) { status = VIZN_Is_Rec_User(&VIZN_API_CLIENT(Shell),(char *)Current_User.c_str()); if (status == kStatus_API_Layer_Success) { if (queueMsg.argc == 1) { int get_taste_num = 0; status = VIZN_GetTaste(&VIZN_API_CLIENT(Shell),(char*)Current_User.c_str(), &get_taste_num); if (status == kStatus_API_Layer_Success) { SHELL_Printf(shellContextHandle, "User: %s order the a cup of %s \r\n", Current_User.c_str(), Coffee_type[(cfg_Coffee_taste)(get_taste_num)]); } else { SHELL_Printf(shellContextHandle, "Sorry, please order again, Current user is %s\r\n",Current_User.c_str()); } } else if(queueMsg.argc == 2) { int coffee_taste = atoi(queueMsg.argv[1]); if (coffee_taste >= Cappuccino && coffee_taste <= Others) { status = VIZN_SetTaste(&VIZN_API_CLIENT(Shell),(char*)Current_User.c_str(), (cfg_Coffee_taste)coffee_taste); if (status == kStatus_API_Layer_Success) { SHELL_Printf(shellContextHandle, "User: %s order a cup of %s \r\n", Current_User.c_str(), Coffee_type[coffee_taste]); } else { SHELL_Printf(shellContextHandle, "Cannot set coffee taste, Current user is %s\r\n",Current_User.c_str()); } } else { SHELL_Printf(shellContextHandle, "Unsupported coffee taste\r\n"); } } } } Use the cafe logo of《Friends》to replace the original Welcome_home picture, use the BmpCvt tool to convert the picture into the corresponding array, and add it to welcomehome_320x122.h. static const unsigned short Coffee_shop_320_122[] = { 0x59E6, 0x6227, 0x6247, 0x59C5, 0x59C5, 0x59A5, 0x4103, 0x6A67, 0x6A47, 0x6227, 0x6A47, 0x6A68, 0x7268, 0x6A67, 0x6A67, 0x6A47, 0x72A9, 0x6A68, 0x7268, 0x6A48, 0x5A06, 0x6A88, 0x6A68, 0x6247, 0x6A47, 0x7289, 0x7289, 0x6A47, 0x6A47, 0x6A47, 0x6227, 0x6A68, 0x6206, 0x6A47, 0x5A26, 0x6247, 0x6227, 0x6A27, 0x4924, 0x836D, 0x5207, 0x7BAC, 0x5247, 0x83ED, 0x4A47, 0x2923, 0x7B8C, 0x49E5, 0x49E5, 0x4A05, 0x28C1, 0x5226, 0x6267, 0x6A87, 0x72E9, 0x6267, 0x6AA9, 0x5A27, 0x6AA9, 0x6AA9, 0x5A47, 0x6A88, 0x5A06, 0x5A47, 0x6AA9, 0x5A47, 0x62A9, 0x5206, 0x6288, 0x6268, 0x5A47, 0x5A27, 0x5A47, 0x5A27, 0x49E6, 0x4A07, 0x4A07, 0x5A89, 0x49C6, 0x5A48, 0x5A28, 0x5A47, 0x5226, 0x49E6, 0x49C6, 0x41A6, 0x5208, 0x2082, 0x52A8, 0x6B6B, 0x39A5, 0x39A5, 0x3964, 0x49E7, 0x3104, 0x49C7, 0x3945, 0x41A6, 0x28A2, 0x2061, 0x3965, 0x28E3, 0x1881, 0x3944, 0x3103, 0x3103, 0x3903, 0x4145, 0x51A6, 0x51C6, 0x4985, 0x51E6, 0x51E6, 0x61E7, 0x6A48, 0x6A28, 0x6A28, 0x6A27, 0x61E6, 0x6207, 0x6A68, 0x59E7, 0x4185, 0x51E6, 0x51A6, 0x6228, 0x5A07, 0x6228, 0x5A08, 0x4184, 0x41A5, 0x4164, 0x3944, 0x3944, 0x736B, 0x83ED, 0x41A5, 0x83ED, 0x6288, 0x8BAB, 0x836A, 0x6287, 0x6B2A, 0x5267, 0x83CD, 0x5A68, 0x5228, 0x3986, 0x3985, 0x7B0A, 0x6A67, 0x7267, 0x832B, 0x49A5, 0x6206, 0x8AC9, 0x72A8, 0x82C9, 0x82E9, 0x8309, 0x6A46, 0x8B2B, 0x3860, 0x8329, 0x6A67, 0x7288, 0x7268, 0x61E6, 0x7267, 0x6A67, 0x59C5, 0x51A4, 0x6A46, 0x7AA8, 0x6A26, 0x7287, 0x7AA8, 0x72A8, 0x72A9, 0x51C5, 0x5A27, 0x5A27, 0x3923, 0x ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ 0x7B8C, 0x734B, 0x6B0A, 0x83CD, 0x83ED, 0x8C0E, 0x7B8C, 0x7B6C, 0x20C2, 0x5227, 0x83ED, 0x6AE9, 0x734B, 0x62A9, 0x7B6B, 0x7B8C, 0x62E9, 0x7BAC, 0x7B6B, 0x732A, 0x940D, 0x83AC, 0x732A, 0x7309, 0x8BCC, 0x7309, 0x8BCD, 0x83AC, 0x7B6B, 0x940D, 0x3943, 0x942E, 0x7B6B, 0x734A, 0x7B8B, 0x62C8, 0x7B8B, 0x7B6A, 0x7BAB, 0x732A, 0x7B6B, 0x7B6B, 0x83CC, 0x6B09, 0x6AA9, 0x6AE9, 0x7B6B, 0x7B8B, 0x83AC, 0x734B, 0x6AC9, 0x6B0A, 0x734B, 0x734A, 0x62A8, 0x732A, 0x8C0E, 0x8BCD, 0x944F, 0x734B, 0x7B8B, 0x732A, 0x942E, 0x8BCD, 0x83AD, 0x732B, 0x6B0A, 0x6AEA, 0x62C9, 0x9C90, 0x28C2, 0x8BEE, 0x93EE, 0x8BCD, 0x4183, 0x838B, 0x7B6A, 0x6287, 0x8BCB }; Programming the new project After saving the modified code and recompile the sln_viznas_iot_elock_oobe project (as shown in the figure below), then connect the MCU-LINK to J6 on the SLN-VIZNAS-IOT, just like Fig9 shows. Fig8 Recompile code Fig9 MCU-LINK (Note: it needs to reselect the Flash driver, as the below figure shows.) Fig10 Flash driver After that, it's able to program the code project to the on-board Hyperflash. Test & Summary When the new code project boot-up, please refer to Get Started with the SLN-VIZNAS-IOT to use the serial terminal to test the newly added three Shell commands: orders, set_taste, and get_taste. Once a face is successfully recognized, the cafe logo will appear up (as shown in Fig11). Fig11 Cafe logo Definitely, this smart coffee machine seems like a 'toy' demo, and there is a lot of work to improve it. Below is the list of my future work plans, Use the LCD panel instead of USB to display; Connect an external amplifier to enable voice prompt feature; Enable the Wifi feature to connect to the App; Use the GUI library to enhance UI experience; Add a voice recognition feature to control; And I'll be glad to hear any comments from you.

As we know, the RT series MCUs support the XIP (Execute in place) mode and benefit from saving the number of pins, serial NOR Flash is most commonly used, as the FlexSPI module can high efficient fetch the code and data from the Serial NOR flash for Cortex-M7 to execute. The fetch way is implementing via utilizing the Quad IO Fast Read command, meanwhile, the serail NOR flash works in the SDR (Single Data transfer Rate) mode, it receives data on SCLK rise edge and transmits data on SCLK fall edge. Comparing to the SDR mode, the DDR (Dual Data transfer Rate) mode has a higher throughput capacity, whether it can provide better performance of XIP mode, and how to do that if we want the Serial NOR Flash to work in DDR (Dual Data transfer Rate) mode? SDR & DDR mode SDR mode: In SDR (Single Data transfer Rate) mode, data is only clocked on one edge of the clock (either the rising or falling edge). This means that for SDR to have data being transmitted at X Mbps, the clock bit rate needs to be 2X Mbps. DDR mode: For DDR (Dual Data transfer Rate) mode, also known as DTR (Dual Transfer Rate) mode, data is transferred on both the rising and falling edge of the clock. This means data is transmitted at X Mbps only requires the clock bit rate to be X Mbps, hence doubling the bandwidth (as Fig 1 shows). Fig 1 Enable DDR mode The below steps illustrate how to make the i.MX RT1060 boot from the QSPI with working in DDR mode. Note: The board is MIMXRT1060, IDE is MCUXpresso IDE Open a hello_world as the template Modify the FDCB（Flash Device Configuration Block） a）Set the controllerMiscOption parameter to supports DDR read command. b) Set Serial Flash frequency to 60 MHz. c）Parase the DDR read command into command sequence. The following table shows a template command sequence of DDR Quad IO FAST READ instruction and it's almost matching with the FRQDTR (Fast Read Quad IO DTR) Sequence of IS25WP064 (as Fig 2 shows). Fig2 FRQDTR Sequence d）Adjust the dummy cycles. The dummy cycles should match with the specific serial clock frequency and the default dummy cycles of the FRQDTR sequence command is 6 (as the below table shows). However, when the serial clock frequency is 60MHz, the dummy cycle should change to 4 (as the below table shows). So it needs to configure [P6:P3] bits of the Read Register (as the below table shows) via adding the SET READ PARAMETERS command sequence(as Fig 3 shows) in FDCB manually. Fig 3 SET READ PARAMETERS command sequence In further, in DDR mode, the SCLK cycle is double the serial root clock cycle. The operand value should be set as 2N, 2N-1 or 2*N+1 depending on how the dummy cycles defined in the device datasheet. In the end, we can get an adjusted FCDB like below. // Set Dummy Cycles #define FLASH_DUMMY_CYCLES 8 // Set Read register command sequence's Index in LUT table #define CMD_LUT_SEQ_IDX_SET_READ_PARAM 7 // Read,Read Status,Write Enable command sequences' Index in LUT table #define CMD_LUT_SEQ_IDX_READ 0 #define CMD_LUT_SEQ_IDX_READSTATUS 1 #define CMD_LUT_SEQ_IDX_WRITEENABLE 3 const flexspi_nor_config_t qspiflash_config = { .memConfig = { .tag = FLEXSPI_CFG_BLK_TAG, .version = FLEXSPI_CFG_BLK_VERSION, .readSampleClksrc=kFlexSPIReadSampleClk_LoopbackFromDqsPad, .csHoldTime = 3u, .csSetupTime = 3u, // Enable DDR mode .controllerMiscOption = kFlexSpiMiscOffset_DdrModeEnable | kFlexSpiMiscOffset_SafeConfigFreqEnable, .sflashPadType = kSerialFlash_4Pads, //.serialClkFreq = kFlexSpiSerialClk_100MHz, .serialClkFreq = kFlexSpiSerialClk_60MHz, .sflashA1Size = 8u * 1024u * 1024u, // Enable Flash register configuration .configCmdEnable = 1u, .configModeType[0] = kDeviceConfigCmdType_Generic, .configCmdSeqs[0] = { .seqNum = 1, .seqId = CMD_LUT_SEQ_IDX_SET_READ_PARAM, .reserved = 0, }, .lookupTable = { // Read LUTs [4*CMD_LUT_SEQ_IDX_READ] = FLEXSPI_LUT_SEQ(CMD_SDR, FLEXSPI_1PAD, 0xED, RADDR_DDR, FLEXSPI_4PAD, 0x18), // The MODE8_DDR subsequence costs 2 cycles that is part of the whole dummy cycles [4*CMD_LUT_SEQ_IDX_READ + 1] = FLEXSPI_LUT_SEQ(MODE8_DDR, FLEXSPI_4PAD, 0x00, DUMMY_DDR, FLEXSPI_4PAD, FLASH_DUMMY_CYCLES-2), [4*CMD_LUT_SEQ_IDX_READ + 2] = FLEXSPI_LUT_SEQ(READ_DDR, FLEXSPI_4PAD, 0x04, STOP, FLEXSPI_1PAD, 0x00), // READ STATUS REGISTER [4*CMD_LUT_SEQ_IDX_READSTATUS] = FLEXSPI_LUT_SEQ(CMD_SDR, FLEXSPI_1PAD, 0x05, READ_SDR, FLEXSPI_1PAD, 0x01), [4*CMD_LUT_SEQ_IDX_READSTATUS + 1] = FLEXSPI_LUT_SEQ(STOP, FLEXSPI_1PAD, 0x00, 0, 0, 0), // WRTIE ENABLE [4*CMD_LUT_SEQ_IDX_WRITEENABLE] = FLEXSPI_LUT_SEQ(CMD_SDR,FLEXSPI_1PAD, 0x06, STOP, FLEXSPI_1PAD, 0x00), // Set Read register [4*CMD_LUT_SEQ_IDX_SET_READ_PARAM] = FLEXSPI_LUT_SEQ(CMD_SDR,FLEXSPI_1PAD, 0x63, WRITE_SDR, FLEXSPI_1PAD, 0x01), [4*CMD_LUT_SEQ_IDX_SET_READ_PARAM + 1] = FLEXSPI_LUT_SEQ(STOP,FLEXSPI_1PAD, 0x00, 0, 0, 0), }, }, .pageSize = 256u, .sectorSize = 4u * 1024u, .blockSize = 64u * 1024u, .isUniformBlockSize = false, }; Is DDR mode real better? According to the RT1060's datasheet, the below table illustrates the maximum frequency of FlexSPI operation, as the MIMXRT1060's onboard QSPI flash is IS25WP064AJBLE, it doesn't contain the MQS pin, it means set MCR0.RXCLKsrc=1 (Internal dummy read strobe and loopbacked from DQS) is the most optimized option. operation mode RXCLKsrc=0 RXCLKsrc=1 RXCLKsrc=3 SDR 60 MHz 133 MHz 166 MHz DDR 30 MHz 66 MHz 166 MHz In another word, QSPI can run up to 133 MHz in SDR mode versus 66 MHz in DDR mode. From the perspective of throughput capacity, they're almost the same. It seems like DDR mode is not a better option for IS25WP064AJBLE and the following experiment will validate the assumption. Experiment mbedtls_benchmark I use the mbedtls_benchmark as the first testing demo and I run the demo under the below conditions: 100MH, SDR mode; 133MHz, SDR mode; 66MHz, DDR mode; According to the corresponding printout information (as below shows), I make a table for comparison and I mark the worst performance of implementation items among the above three conditions, just as Fig 4 shows. SDR Mode run at 100 MHz. FlexSPI clock source is 3, FlexSPI Div is 6, PllPfd2Clk is 720000000 mbedTLS version 2.16.6 fsys=600000000 Using following implementations: SHA: DCP HW accelerated AES: DCP HW accelerated AES GCM: Software implementation DES: Software implementation Asymmetric cryptography: Software implementation MD5 : 18139.63 KB/s, 27.10 cycles/byte SHA-1 : 44495.64 KB/s, 12.52 cycles/byte SHA-256 : 47766.54 KB/s, 11.61 cycles/byte SHA-512 : 2190.11 KB/s, 267.88 cycles/byte 3DES : 1263.01 KB/s, 462.49 cycles/byte DES : 2962.18 KB/s, 196.33 cycles/byte AES-CBC-128 : 52883.94 KB/s, 10.45 cycles/byte AES-GCM-128 : 1755.38 KB/s, 329.33 cycles/byte AES-CCM-128 : 2081.99 KB/s, 279.72 cycles/byte CTR_DRBG (NOPR) : 5897.16 KB/s, 98.15 cycles/byte CTR_DRBG (PR) : 4489.58 KB/s, 129.72 cycles/byte HMAC_DRBG SHA-1 (NOPR) : 1297.53 KB/s, 448.03 cycles/byte HMAC_DRBG SHA-1 (PR) : 1205.51 KB/s, 486.04 cycles/byte HMAC_DRBG SHA-256 (NOPR) : 1786.18 KB/s, 327.70 cycles/byte HMAC_DRBG SHA-256 (PR) : 1779.52 KB/s, 328.93 cycles/byte RSA-1024 : 202.33 public/s RSA-1024 : 7.00 private/s DHE-2048 : 0.40 handshake/s DH-2048 : 0.40 handshake/s ECDSA-secp256r1 : 9.00 sign/s ECDSA-secp256r1 : 4.67 verify/s ECDHE-secp256r1 : 5.00 handshake/s ECDH-secp256r1 : 9.33 handshake/s DDR Mode run at 66 MHz. FlexSPI clock source is 2, FlexSPI Div is 5, PllPfd2Clk is 396000000 mbedTLS version 2.16.6 fsys=600000000 Using following implementations: SHA: DCP HW accelerated AES: DCP HW accelerated AES GCM: Software implementation DES: Software implementation Asymmetric cryptography: Software implementation MD5 : 16047.13 KB/s, 27.12 cycles/byte SHA-1 : 44504.08 KB/s, 12.54 cycles/byte SHA-256 : 47742.88 KB/s, 11.62 cycles/byte SHA-512 : 2187.57 KB/s, 267.18 cycles/byte 3DES : 1262.66 KB/s, 462.59 cycles/byte DES : 2786.81 KB/s, 196.44 cycles/byte AES-CBC-128 : 52807.92 KB/s, 10.47 cycles/byte AES-GCM-128 : 1311.15 KB/s, 446.53 cycles/byte AES-CCM-128 : 2088.84 KB/s, 281.08 cycles/byte CTR_DRBG (NOPR) : 5966.92 KB/s, 97.55 cycles/byte CTR_DRBG (PR) : 4413.15 KB/s, 130.42 cycles/byte HMAC_DRBG SHA-1 (NOPR) : 1291.64 KB/s, 449.47 cycles/byte HMAC_DRBG SHA-1 (PR) : 1202.41 KB/s, 487.05 cycles/byte HMAC_DRBG SHA-256 (NOPR) : 1748.38 KB/s, 328.16 cycles/byte HMAC_DRBG SHA-256 (PR) : 1691.74 KB/s, 329.78 cycles/byte RSA-1024 : 201.67 public/s RSA-1024 : 7.00 private/s DHE-2048 : 0.40 handshake/s DH-2048 : 0.40 handshake/s ECDSA-secp256r1 : 8.67 sign/s ECDSA-secp256r1 : 4.67 verify/s ECDHE-secp256r1 : 4.67 handshake/s ECDH-secp256r1 : 9.00 handshake/s Fig 4 Performance comparison We can find that most of the implementation items are achieve the worst performance when QSPI works in DDR mode with 66 MHz. Coremark demo The second demo is running the Coremark demo under the above three conditions and the result is illustrated below. SDR Mode run at 100 MHz. FlexSPI clock source is 3, FlexSPI Div is 6, PLL3 PFD0 is 720000000 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 391889200 Total time (secs): 16.328717 Iterations/Sec : 2449.671999 Iterations : 40000 Compiler version : MCUXpresso IDE v11.3.1 Compiler flags : Optimization most (-O3) Memory location : STACK seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0x25b5 Correct operation validated. See readme.txt for run and reporting rules. CoreMark 1.0 : 2449.671999 / MCUXpresso IDE v11.3.1 Optimization most (-O3) / STACK SDR Mode run at 133 MHz. FlexSPI clock source is 3, FlexSPI Div is 4, PLL3 PFD0 is 664615368 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 391888682 Total time (secs): 16.328695 Iterations/Sec : 2449.675237 Iterations : 40000 Compiler version : MCUXpresso IDE v11.3.1 Compiler flags : Optimization most (-O3) Memory location : STACK seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0x25b5 Correct operation validated. See readme.txt for run and reporting rules. CoreMark 1.0 : 2449.675237 / MCUXpresso IDE v11.3.1 Optimization most (-O3) / STACK DDR Mode run at 66 MHz. FlexSPI clock source is 2, FlexSPI Div is 5, PLL3 PFD0 is 396000000 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 391890772 Total time (secs): 16.328782 Iterations/Sec : 2449.662173 Iterations : 40000 Compiler version : MCUXpresso IDE v11.3.1 Compiler flags : Optimization most (-O3) Memory location : STACK seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0x25b5 Correct operation validated. See readme.txt for run and reporting rules. CoreMark 1.0 : 2449.662173 / MCUXpresso IDE v11.3.1 Optimization most (-O3) / STACK After comparing the CoreMark scores, it gets the lowest CoreMark score when QSPI works in DDR mode with 66 MHz. However, they're actually pretty close. Through the above two testings, we can get the DDR mode maybe not a better option, at least for the i.MX RT10xx series MCU.

RT10xx SAI basic and SDCard wave file play 1. Introduction NXP RT10xx's audio modules are SAI, SPDIF, and MQS. The SAI module is a synchronous serial interface for audio data transmission. SPDIF is a stereo transceiver that can receive and send digital audio, MQS is used to convert I2S audio data from SAI3 to PWM, and then can drive external speakers, but in practical usage, it still need to add the amplifier drive circuit. When we use the SAI module, it will be related to the audio file play and the data obtained. This article will be based on the MIMXRT1060-EVK board, give the RT10xx SAI module basic knowledge, PCM waveform format, the audio file cut, and conversion tool, use the MCUXpresso IDE CFG peripheral tool to create the SAI project, play the audio data, it will also provide the SDcard with fatfs system to read the wave file and play it. 2. Basic Knowledge and the tools Before entering the project details and testing, just provide some SAI module knowledge, wave file format information, audio convert tools. 2.1 SAI module basic RT10xx SAI module can support I2S, AC97, TDM, and codec/DSP interface. SAI module contains Transmitter and Receiver, the related signals: SAI_MCLK: master clock, used to generate the bit clock, master output, slave input. SAI_TX_BCLK: Transmit bit clock, master output, slave input SAI_TX_SYNC: Transmit Frame sync, master output, slave input, L/R channel select SAI_TX_DATA[4]:Transmit data line, 1-3 share with RX_DATA[1-3] SAI_RX_BCLK: receiver bit clock SAI_RX_SYNC: receiver frame sync SAI_RX_DATA[4]: receiver data line SAI module clocks: audio master clock, bus clock, bit clock SAI module Frame sync has 3 modes: 1）Transmit and receive using its own BCLK and SYNC 2）Transmit async, receive sync: use transmit BCLK and SYNC, transmit enable at first, disable at last. 3）Transmit sync, receive async: use receive BCLK and SYNC, receiver enable at first, disable at last. Valid frame sync is also ignored (slave mode) or not generated (master mode) for the first four-bit clock cycles after enabling the transmitter or receiver. Pic 1 SAI module clock structure: Pic 2 SAI module 3 clock sources: PLL3_PFD3, PLL5, PLL4 In the above picture, SAI1_CLK_ROOT, which can be used as the MCLK, the BCLK is: BCLK= master clock/(TCR2[DIV]+1)*2 Sample rate = Bitclockfreq /（bitwidth*channel） 2.2 waveform audio file format WAVE file is used to save the PCM encode data, WAVE is using the RIFF format, the smallest unit in the RIFF file is the CK struct, CKID is the data type, the value can be: “RIFF”,“LIST”,“fmt”, “data” etc. RIFF file is little-endian. RIFF structure： typedef unsigned long DWORD;//4B typedef unsigned char BYTE;//1B typedef DWORD FOURCC; // 4B typedef struct { FOURCC ckID; //4B DWORD ckSize; //4B union { FOURCC fccType; // RIFF form type 4B BYTE ckData[ckSize]; //ckSize*1B } ckData; } RIFFCK; Pic 3 Take a 16Khz 2 channel wave file as the example: Pic 4 Yellow: CKID Green: data length Purple: data The detailed analysis as follows: Pic 5 We can find, the real audio data, except the wave header, the data size is 1279860bytes. 2.3 Audio file convert In practical usage, the audio file may not the required channel and the sample rate configuration, or the format is not the wave, or the time is too long, then we can use some tool to convert it to your desired format. We can use the ffmpeg tool: https://ffmpeg.org/ About the details, check the ffmpeg document, normally we use these command: mp3 file converts to 16k, 16bit, 2 channel wave file: ffmpeg -i test.mp3 -acodec pcm_s16le -ar 16000 -ac 2 test.wav or： ffmpeg -i test.mp3 -aq 16 -ar 16000 -ac 2 test.wav test.wav, cut 35s from 00:00:00, and can convert save to test1.wav: ffmpeg -ss 00:00:00 -i test.wav -t 35.0 -c copy test1.wav Pic 6 Pic 7 2.4 Obtain wave L/R channel audio data Just like the SDK code, save the L/R audio data directly in the RT RAM array, so here, we need to obtain the audio data from the wav file. We can use the python readout the wav header, then get the audio data size, and save the audio data to one array in the .h files. The related Python code can be: import sys import wave def wav2hex(strWav, strHex): with wave.open(strWav, "rb") as fWav: wavChannels = fWav.getnchannels() wavSampleWidth = fWav.getsampwidth() wavFrameRate = fWav.getframerate() wavFrameNum = fWav.getnframes() wavFrames = fWav.readframes(wavFrameNum) wavDuration = wavFrameNum / wavFrameRate wafFramebytes = wavFrameNum * wavChannels * wavSampleWidth print("Channels: {}".format(wavChannels)) print("Sample width: {}bits".format(wavSampleWidth * 8)) print("Sample rate: {}kHz".format(wavFrameRate/1000)) print("Frames number: {}".format(wavFrameNum)) print("Duration: {}s".format(wavDuration)) print("Frames bytes: {}".format(wafFramebytes)) fWav.close() pass with open(strHex, "w") as fHex: # Print WAV parameters fHex.write("/*\n"); fHex.write(" Channels: {}\n".format(wavChannels)) fHex.write(" Sample width: {}bits\n".format(wavSampleWidth * 8)) fHex.write(" Sample rate: {}kHz\n".format(wavFrameRate/1000)) fHex.write(" Frames number: {}\n".format(wavFrameNum)) fHex.write(" Duration: {}s\n".format(wavDuration)) fHex.write(" Frames bytes: {}\n".format(wafFramebytes)) fHex.write("*/\n\n") # Print WAV frames fHex.write("uint8_t music[] = {\n") print("Transferring...") i = 0 while wafFramebytes > 0: if(wafFramebytes < 16): BytesToPrint = wafFramebytes else: BytesToPrint = 16 fHex.write(" ") for j in range(0, BytesToPrint): if j != 0: fHex.write(' ') fHex.write("0x{:0>2x},".format(wavFrames[i])) i+=1 j+=1 fHex.write("\n") wafFramebytes -= BytesToPrint fHex.write("};\n") fHex.close() print("Done!") wav2hex(sys.argv[1], sys.argv[2]) Take the music1.wave as an example: Pic 8 2.4 Audio data relationship with audio wave 16bit data range is: -32768 to 32767, the goldwave related value range is(-1~1).Use goldwave tool to open the example music1.wav, check the data in 1s position, the left channel relative data is -0.08227, right channel relative data is -0.2257. Pic 9 pic 10 Now, calculate the L/R real data, and find the position in the music1.h. Pic 11 From pic 8, we can know, the real wave R/L data from line 11, each line contains 16 bytes of data. So, from music1.wav related value, we can calculate the related data, and compare it with the real data in the array, we can find, it is totally the same. 3. SAI MCUXpresso project creation Based on SDK_2.9.2_EVK-MIMXRT1060, create one SAI DMA audio play project. The audio data can use the above music1.h. Create one bare-metal project: Drivers check： clock, common, dmamux, edma,gpio,i2c,iomuxc,lpuart,sai,sai_edma,xip_device Utilities check： Debug_console,lpuart_adapter,serial_manager,serial_manager_uart Board components check： Xip_board Abstraction Layer check： Codec, codec_wm8960_adapter,lpi2c_adapter Software Components check： Codec_i2c,lists,wm8960 After the creation of the project, open the clocks, configure the clock, core, flexSPI can use the default one, we mainly configure the SAI1 related clocks: Pic 12 Select the SAI1 clock source as PLL4, PLL4_MAIN_CLK configure as 786.48MHz. SAI1 clock configure as 6.144375MHz. After the configuration, update the code. Open Pins tool, configure the SAI1 related pins, as the codec also need the I2C, so it contains the I2C pin configuration. Pic 13 Update the code. Open peripherals, configure DMA, SAI, NVIC. Pic 14 Pic 15 DMA配置如下： pic16 After configuration, generate the code. In the above configuration, we have finished the SAI DMA transfer configuration, SAI master mode, 16bits, the sample rate is 16kHz, 2channel, DMA transfer, bit clock is 512Khz, the master clock is 6.1443Mhz. void callback(I2S_Type *base, sai_edma_handle_t *handle, status_t status, void *userData) { if (kStatus_SAI_RxError == status) { } else { finishIndex++; emptyBlock++; /* Judge whether the music array is completely transfered. */ if (MUSIC_LEN / BUFFER_SIZE == finishIndex) { isFinished = true; finishIndex = 0; emptyBlock = BUFFER_NUM; tx_index = 0; cpy_index = 0; } } } int main(void) { sai_transfer_t xfer; /* Init board hardware. */ BOARD_ConfigMPU(); BOARD_InitBootPins(); BOARD_InitBootClocks(); BOARD_InitBootPeripherals(); #ifndef BOARD_INIT_DEBUG_CONSOLE_PERIPHERAL /* Init FSL debug console. */ BOARD_InitDebugConsole(); #endif PRINTF(" SAI wav module test!\n\r"); /* Use default setting to init codec */ if (CODEC_Init(&codecHandle, &boardCodecConfig) != kStatus_Success) { assert(false); } /* delay for codec output stable */ DelayMS(DEMO_CODEC_INIT_DELAY_MS); CODEC_SetVolume(&codecHandle,2U,50); // set 50% volume EnableIRQ(DEMO_SAI_IRQ); SAI_TxEnableInterrupts(DEMO_SAI, kSAI_FIFOErrorInterruptEnable); PRINTF(" MUSIC PLAY Start!\n\r"); while (1) { PRINTF(" MUSIC PLAY Again\n\r"); isFinished = false; while (!isFinished) { if ((emptyBlock > 0U) && (cpy_index < MUSIC_LEN / BUFFER_SIZE)) { /* Fill in the buffers. */ memcpy((uint8_t *)&buffer[BUFFER_SIZE * (cpy_index % BUFFER_NUM)], (uint8_t *)&music[cpy_index * BUFFER_SIZE], sizeof(uint8_t) * BUFFER_SIZE); emptyBlock--; cpy_index++; } if (emptyBlock < BUFFER_NUM) { /* xfer structure */ xfer.data = (uint8_t *)&buffer[BUFFER_SIZE * (tx_index % BUFFER_NUM)]; xfer.dataSize = BUFFER_SIZE; /* Wait for available queue. */ if (kStatus_Success == SAI_TransferSendEDMA(DEMO_SAI, &SAI1_SAI_Tx_eDMA_Handle, &xfer)) { tx_index++; } } } } } 4. SAI test result To check the real L/R data sendout situation, we modify the music array first 16 bytes data as: 0x55,0xaa,0x01,0x00,0x02,0x00,0x03,0x00,0x04,0x00,0x05,0x00,0x06,0x00,0x07,0x00 Then test SAI_MCLK,SAI_TX_BCLK,SAI_TX_SYNC,SAI_TXD pin wave, and compare with the defined data, because the polarity is configured as active low, it is falling edge output, sample at rising edge. The test point on the MIMXRT1060-EVK board is using the codec pin position: Pic 17 4.1 Logic Analyzer tool wave Pic 18 MCLK clock frequency is 6.144375Mhz, BCLK is 512KHz, SYNC is 16KHz. Pic 19 The first frame data is:1010101001010101 0000000000000001 0XAA55 0X0001 It is the same as the array defined L/R data. SYNC low is Left 16 bit, High is right 16 bit. 4.2 Oscilloscope test wave Just like the logic analyzer, the oscilloscope wave is the same： Pic 20 Add the music.h to the project, and let the main code play the music array data in loop, we will hear the music clear when insert the headphone to on board J12 or add a speaker. 5. SAI SDcard wave music play This part will add the sd card, fatfs system, to read out the 16bit 16K 2ch wave file in the sd card, and play it in loop. 5.1 driver add Code is based on SDK_2.9.2_EVK-MIMXRT1060, just on the previous project, add the sdcard, sd fatfs driver, now the bare-metal driver situation is: Drivers check： cache, clock, common, dmamux, edma,gpio,i2c,iomuxc,lpuart,sai,sai_edma,sdhc, xip_device Utilities check： Debug_console,lpuart_adapter,serial_manager,serial_manager_uart Middleware check： File System->FAT File System->fatfs+sd, Memories Board components check： Xip_board Abstraction Layer check： Codec, codec_wm8960_adapter,lpi2c_adapter Software Components check： Codec_i2c,lists,wm8960 5.2 WAVE header analyzer with code From previous content, we can know the wav header structure, we need to play the wave file from the sd card, then we need to analyze the wave header to get the audio format, audio data-related information. The header analysis code is: uint8_t Fun_Wave_Header_Analyzer(void) { char * datap; uint8_t ErrFlag = 0; datap = strstr((char*)Wav_HDBuffer,"RIFF"); if(datap != NULL) { wav_header.chunk_size = ((uint32_t)*(Wav_HDBuffer+4)) + (((uint32_t)*(Wav_HDBuffer + 5)) << + (((uint32_t)*(Wav_HDBuffer + 6)) << 16) +(((uint32_t)*(Wav_HDBuffer + 7)) << 24); movecnt += 8; } else { ErrFlag = 1; return ErrFlag; } datap = strstr((char*)(Wav_HDBuffer+movecnt),"WAVEfmt"); if(datap != NULL) { movecnt += 8; wav_header.fmtchunk_size = ((uint32_t)*(Wav_HDBuffer+movecnt+0)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 1)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 2)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 3)) << 24); wav_header.audio_format = ((uint16_t)*(Wav_HDBuffer+movecnt+4) + (uint16_t)*(Wav_HDBuffer+movecnt+5)); wav_header.num_channels = ((uint16_t)*(Wav_HDBuffer+movecnt+6) + (uint16_t)*(Wav_HDBuffer+movecnt+7)); wav_header.sample_rate = ((uint32_t)*(Wav_HDBuffer+movecnt+8)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 9)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 10)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 11)) << 24); wav_header.byte_rate = ((uint32_t)*(Wav_HDBuffer+movecnt+12)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 13)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 14)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 15)) << 24); wav_header.block_align = ((uint16_t)*(Wav_HDBuffer+movecnt+16) + (uint16_t)*(Wav_HDBuffer+movecnt+17)); wav_header.bps = ((uint16_t)*(Wav_HDBuffer+movecnt+18) + (uint16_t)*(Wav_HDBuffer+movecnt+19)); movecnt +=(4+wav_header.fmtchunk_size); } else { ErrFlag = 1; return ErrFlag; } datap = strstr((char*)(Wav_HDBuffer+movecnt),"LIST"); if(datap != NULL) { movecnt += 4; wav_header.list_size = ((uint32_t)*(Wav_HDBuffer+movecnt+0)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 1)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 2)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 3)) << 24); movecnt +=(4+wav_header.list_size); } //LIST not Must datap = strstr((char*)(Wav_HDBuffer+movecnt),"data"); if(datap != NULL) { movecnt += 4; wav_header.datachunk_size = ((uint32_t)*(Wav_HDBuffer+movecnt+0)) + (((uint32_t)*(Wav_HDBuffer +movecnt+ 1)) << + (((uint32_t)*(Wav_HDBuffer +movecnt+ 2)) << 16) +(((uint32_t)*(Wav_HDBuffer +movecnt+ 3)) << 24); movecnt += 4; ErrFlag = 0; } else { ErrFlag = 1; return ErrFlag; } PRINTF("Wave audio format is %d\r\n",wav_header.audio_format); PRINTF("Wave audio channel number is %d\r\n",wav_header.num_channels); PRINTF("Wave audio sample rate is %d\r\n",wav_header.sample_rate); PRINTF("Wave audio byte rate is %d\r\n",wav_header.byte_rate); PRINTF("Wave audio block align is %d\r\n",wav_header.block_align); PRINTF("Wave audio bit per sample is %d\r\n",wav_header.bps); PRINTF("Wave audio data size is %d\r\n",wav_header.datachunk_size); return ErrFlag; } Mainly divide RIFF to 4 parts: “RIFF”,“fmt”,“LIST”,“data”. The 4 bytes data follows the “data” is the whole audio data size, it can be used to the fatfs to read the audio data. The above code also recodes the data position, then when using the fatfs read the wave, we can jump to the data area directly. 5.3 SD card wave data play Define the array audioBuff[4* 512], used to read out the sd card wave file, and use these data send to the SAI EDMA and transfer it to the I2S interface until all the data is transmitted to the I2S interface. Callback record each 512 bytes data send out finished, and judge the transmit data size is reached the whole wave audio data size. 5.4 sd card wave play result Prepare one wave file, 16bit 16k sample rate, 2 channel file, named as music.wav, put in the sd card which already does the fat32 format, insert it to the MIMXRT1060-EVK J39, run the code, will get the printf information: Please insert a card into the board. Card inserted. Make file system......The time may be long if the card capacity is big. SAI wav module test! MUSIC PLAY Start! Wave audio format is 1 Wave audio channel number is 2 Wave audio sample rate is 16000 Wave audio byte rate is 64000 Wave audio block align is 4 Wave audio bit per sample is 16 Wave audio data size is 2728440 Playback is begin! Playback is finished! At the same time, after inserting the headphone or the speaker into the J12, we can hear the music. Attachment is the mcuxpresso10.3.0 and the wave samples.

In the tutorial, I'd like to show the steps of deploying an image classification model on i.MX RT1060 to enabling you to classify fashion images and categories. In the first part of this tutorial, we will review the Fashion MNIST dataset, including how to download it to your system. From there we’ll define a simple CNN network using the TensorFlow platform. Next, we’ll train our CNN model on the Fashion MNIST dataset, train it, and review the results. Finally, we'll optimize the model, after that, the model will be smaller and increase inferencing speed, which is valuable for source-limited devices such as MCU. Let’s go ahead and get started! Fashion MNIST dataset The Fashion MNIST dataset was created by the e-commerce company, Zalando. Fig 1 Fashion MNIST dataset As they note on their official GitHub repo for the Fashion MNIST dataset, there are a few problems with the standard MNIST digit recognition dataset: It’s far too easy for standard machine learning algorithms to obtain 97%+ accuracy. It’s even easier for deep learning models to achieve 99%+ accuracy. The dataset is overused. MNIST cannot represent modern computer vision tasks. Zalando, therefore, created the Fashion MNIST dataset as a drop-in replacement for MNIST. 60,000 training examples 10,000 testing examples 10 classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot 28×28 grayscale images The code below loads the Fashion-MNIST dataset using the TensorFlow and creates a plot of the first 25 images in the training dataset. import tensorflow as tf import numpy as np # For easy reset of notebook state. tf.keras.backend.clear_session() # load dataset fashion_mnist = tf.keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() lass_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] plt.figure(figsize=(8,8)) for i in range(25): plt.subplot(5,5,i+1,) plt.tight_layout() plt.imshow(train_images[i]) plt.xlabel(lass_names[train_labels[i]]) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.show() Fig 2 Running the code loads the Fashion-MNIST train and test dataset and prints their shape. Fig 3 We can see that there are 60,000 examples in the training dataset and 10,000 in the test dataset and that images are indeed square with 28×28 pixels. Creating model We need to define a neural network model for the image classify purpose, and the model should have two main parts: the feature extraction and the classifier that makes a prediction. Defining a simple Convolutional Neural Network (CNN) For the convolutional front-end, we build 3 layers of convolution layer with a small filter size (3,3) and a modest number of filters followed by a max-pooling layer. The last filter map is flattened to provide features to the classifier. As we know, it's a multi-class classification task, so we will require an output layer with 10 nodes in order to predict the probability distribution of an image belonging to each of the 10 classes. In this case, we will require the use of a softmax activation function. And between the feature extractor and the output layer, we can add a dense layer to interpret the features. All layers will use the ReLU activation function and the He weight initialization scheme, both best practices. We will use the Adam optimizer to optimize the sparse_categorical_crossentropy loss function, suitable for multi-class classification, and we will monitor the classification accuracy metric, which is appropriate given we have the same number of examples in each of the 10 classes. The below code will define and run it will show the struct of the model. # Define a Model model = tf.keras.models.Sequential() # First Convolution ，Kernel:16*3*3 model.add( tf.keras.layers.Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_uniform',input_shape=(28, 28, 1))) model.add( tf.keras.layers.MaxPooling2D((2, 2))) # Second Convolution ，Kernel:32*3*3 model.add( tf.keras.layers.Conv2D(32, (3, 3), activation='relu',kernel_initializer='he_uniform')) model.add( tf.keras.layers.MaxPooling2D((2, 2))) # Third Convolution ，Kernel:32*3*3 model.add( tf.keras.layers.Conv2D(32, (3, 3), activation='relu',kernel_initializer='he_uniform')) model.add( tf.keras.layers.Flatten()) model.add( tf.keras.layers.Dense(32, activation='relu',kernel_initializer='he_uniform')) model.add( tf.keras.layers.Dense(10, activation='softmax')) Fig 4 Training Model After the model is defined, we need to train it. The model will be trained using 5-fold cross-validation. The value of k=5 was chosen to provide a baseline for both repeated evaluation and to not be too large as to require a long running time. Each validation set will be 20% of the training dataset or about 12,000 examples. The training dataset is shuffled prior to being split and the sample shuffling is performed each time so that any model we train will have the same train and validation datasets in each fold, providing an apples-to-apples comparison. We will train the baseline model for a modest 20 training epochs with a default batch size of 32 examples. The validation set for each fold will be used to validate the model during each epoch of the training run, so we can later create learning curves, and at the end of the run, we use the test dataset to estimate the performance of the model. As such, we will keep track of the resulting history from each run, as well as the classification accuracy of the fold. The train_model() function below implements these behaviors, taking the training dataset and test dataset as arguments, and returning a list of accuracy scores and training histories that can be later summarized. from sklearn.model_selection import KFold # train a model using k-fold cross-validation def train_model(dataX, dataY, n_folds=5): scores, histories = list(), list() # prepare cross validation kfold = KFold(n_folds, shuffle=True, random_state=1) for train_ix, validate_ix in kfold.split(dataX): # select rows for train and test trainX, trainY, validate_X, validate_Y = dataX[train_ix], dataY[train_ix], dataX[validate_ix], dataY[validate_ix] # fit model history = model.fit(trainX, trainY, epochs=20, batch_size=32, validation_data=(validate_X, validate_Y), verbose=0) # evaluate model _, acc = model.evaluate(validate_X, validate_Y, verbose=0) print("Accurary: {:.4f}，Total number of figures is {:0>2d}".format(acc * 100.0, len(testY))) # append scores scores.append(acc) histories.append(history) return scores, histories Module Summary After the model has been trained, we can present the results. There are two key aspects to present: the diagnostics of the learning behavior of the model during training and the estimation of the model performance. These can be implemented using separate functions. First, the diagnostics involve creating a line plot showing model performance on the train and validate set during each fold of the k-fold cross-validation. These plots are valuable for getting an idea of whether a model is overfitting, underfitting, or has a good fit for the dataset. We will create a single figure with two subplots, one for loss and one for accuracy. Blue lines will indicate model performance on the training dataset and orange lines will indicate performance on the hold-out validate dataset. The summarize_diagnostics() function below creates and shows this plot given the collected training histories. # plot diagnostic learning curves def summarize_diagnostics(histories): for i in range(len(histories)): # plot loss plt.subplot(2,1,1) plt.title('Cross Entropy Loss') plt.plot(histories[i].history['loss'], color='blue', label='train') plt.plot(histories[i].history['val_loss'], color='orange', label='test') # plot accuracy plt.subplot(2,1,2) plt.title('Classification Accuracy') plt.plot(histories[i].history['accuracy'], color='blue', label='train') plt.plot(histories[i].history['val_accuracy'], color='orange', label='test') plt.show() Fig 5 Next, the classification accuracy scores collected during each fold can be summarized by calculating the mean and standard deviation. This provides an estimate of the average expected performance of the model trained on the test dataset, with an estimate of the average variance in the mean. We will also summarize the distribution of scores by creating and showing a box and whisker plot. The summarize_performance() function below implements this for a given list of scores collected during model training. # summarize model performance def summarize_performance(scores): # print summary print('Accuracy: mean={:.4f} std={:.4f}, n={:0>2d}'.format(np.mean(trained_scores)*100, np.std(trained_scores)*100, len(scores))) # box and whisker plots of results plt.boxplot(scores) plt.show() Fig 6 Verifying predictions According to the above figure, we see that the final trained model can get up to around 87.6% accuracy when predicting the test dataset. And with the trained model, running the below code will demonstrate the result of predictions about some images. def plot_image(i, predictions_array, true_label, img): true_label, img = true_label[i], img[i] plt.grid(False) plt.xticks([]) plt.yticks([]) plt.imshow(img.reshape(28, 28), cmap=plt.cm.binary) predicted_label = np.argmax(predictions_array) if predicted_label == true_label: color = 'blue' else: color = 'red' plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label], 100*np.max(predictions_array), class_names[true_label]), color=color) def plot_value_array(i, predictions_array, true_label): true_label = true_label[i] plt.grid(False) plt.xticks(range(10)) plt.yticks([]) thisplot = plt.bar(range(10), predictions_array, color="#777777") plt.ylim([0, 1]) predicted_label = np.argmax(predictions_array) thisplot[predicted_label].set_color('red') thisplot[true_label].set_color('blue') predictions = model.predict(test_images) # Plot the first X test images, their predicted labels, and the true labels. # Color correct predictions in blue and incorrect predictions in red. num_rows = 5 num_cols = 3 num_images = num_rows*num_cols plt.figure(figsize=(2*2*num_cols, 2*num_rows)) for i in range(num_images): plt.subplot(num_rows, 2*num_cols, 2*i+1) plot_image(i, predictions[i], test_labels, test_images) plt.subplot(num_rows, 2*num_cols, 2*i+2) plot_value_array(i, predictions[i], test_labels) plt.tight_layout() plt.show() Fig 7 Model quantization Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy, especially it's crucial to embedded platforms, as it lacks the compute-intensive performance, the Flash and RAM memory is also very limited. TensorFlow Lite is able to be used to convert an already-trained float TensorFlow model to the TensorFlow Lite format. In addition, the TensorFlow Lite provides several approaches to optimize the mode, among these ways, Integer quantization is an optimization strategy that converts 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This results in a smaller model and increased inferencing speed, which is very valuable for low-power devices such as microcontrollers. The below codes show how to implement the Integer quantization of the trained model, and after running these codes, we can find that the size of Tensorflow Lite mode reduces almost 64.9 KB versus the original model, becomes about 32% of the original size(Fig 8). import os # Convert using integer-only quantization def representative_data_gen(): for input_value in tf.data.Dataset.from_tensor_slices(tf.cast(train_images,tf.float32)).shuffle(500).batch(1).take(150): yield [input_value] # Convert using dynamic range quantization converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_model_quant = converter.convert() # Save the model to disk open("model_dynamic_range_quantization.tflite", "wb").write(tflite_model_quant) ## Size difference Dynamic_range_quantization_model_size = os.path.getsize("model_dynamic_range_quantization.tflite") print("Dynamic range quantization model is %d bytes" % Dynamic_range_quantization_model_size) converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_data_gen # Ensure that if any ops can't be quantized, the converter throws an error converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] # Set the input and output tensors to uint8 (APIs added in r2.3) converter.inference_input_type = tf.uint8 converter.inference_output_type = tf.uint8 tflite_model_advanced_quant = converter.convert() # Save the model to disk open("model_integer_only_quantization.tflite", "wb").write(tflite_model_advanced_quant) Integer_only_quantization_model_size = os.path.getsize("model_integer_only_quantization.tflite") print("Integer_only_quantization_model is %d bytes" % Integer_only_quantization_model_size) difference = Dynamic_range_quantization_model_size - Integer_only_quantization_model_size print("Difference is %d bytes" % difference) Fig 8 Evaluating the TensorFlow Lite model Now we'll run inferences using the TensorFlow Lite Interpreter to compare the model accuracies. First, we need a function that runs inference with a given model and images, and then returns the predictions: # Helper function to run inference on a TFLite model def run_tflite_model(tflite_file, test_image_indices): # Initialize the interpreter interpreter = tf.lite.Interpreter(model_path=str(tflite_file)) interpreter.allocate_tensors() input_details = interpreter.get_input_details()[0] output_details = interpreter.get_output_details()[0] predictions = np.zeros((len(test_image_indices),), dtype=int) for i, test_image_index in enumerate(test_image_indices): test_image = test_images[test_image_index] test_label = test_labels[test_image_index] # Check if the input type is quantized, then rescale input data to uint8 if input_details['dtype'] == np.uint8: input_scale, input_zero_point = input_details["quantization"] test_image = test_image / input_scale + input_zero_point test_image = np.expand_dims(test_image, axis=0).astype(input_details["dtype"]) interpreter.set_tensor(input_details["index"], test_image) interpreter.invoke() output = interpreter.get_tensor(output_details["index"])[0] predictions[i] = output.argmax() return predictions Next, we'll compare the performance of the original model and the quantized model on one image. model_basic_quantization.tflite is the original TensorFlow Lite model with floating-point data. model_integer_only_quantization.tflite is the last model we converted using integer-only quantization (it uses uint8 data for input and output). Let's create another function to print our predictions and run it for testing. import matplotlib.pylab as plt # Change this to test a different image test_image_index = 1 ## Helper function to test the models on one image def test_model(tflite_file, test_image_index, model_type): global test_labels predictions = run_tflite_model(tflite_file, [test_image_index]) plt.imshow(test_images[test_image_index].reshape(28,28)) template = model_type + " Model \n True:{true}, Predicted:{predict}" _ = plt.title(template.format(true= str(test_labels[test_image_index]), predict=str(predictions[0]))) plt.grid(False) Fig 9 Fig 10 Then evaluate the quantized model by using all the test images we loaded at the beginning of this tutorial. After summarizing the prediction result of the test dataset, we can see that the prediction accuracy of the quantized model decrease 7% less than the original model, it's not bad. # Helper function to evaluate a TFLite model on all images def evaluate_model(tflite_file, model_type): test_image_indices = range(test_images.shape[0]) predictions = run_tflite_model(tflite_file, test_image_indices) accuracy = (np.sum(test_labels== predictions) * 100) / len(test_images) print('%s model accuracy is %.4f%% (Number of test samples=%d)' % ( model_type, accuracy, len(test_images))) Deploying model Converting TensorFlow Lite model to C file The following code runs xxd on the quantized model, writes the output to a file called model_quantized.cc, in the file, the model is defined as an array of bytes, and prints it to the screen. The output is very long, so we won’t reproduce it all here, but here’s a snippet that includes just the beginning and end. # Save the file as a C source file xxd -i model_integer_only_quantization.tflite > model_quantized.cc # Print the source file cat model_quantized.cc Fig 11 Deploying the C file to project We use the tensorflow_lite_cifar10 demo as a prototype, then replace the original model and do some code modification, below is the code in the modified main file. #include "board.h" #include "fsl_debug_console.h" #include "pin_mux.h" #include "timer.h" #include <iomanip> #include <iostream> #include <string> #include <vector> #include "tensorflow/lite/kernels/register.h" #include "tensorflow/lite/model.h" #include "tensorflow/lite/optional_debug_tools.h" #include "tensorflow/lite/string_util.h" #include "get_top_n.h" #include "model.h" #define LOG(x) std::cout // ---------------------------- Application ----------------------------- // Lenet Mnist model input data size (bytes). #define LENET_MNIST_INPUT_SIZE 28*28*sizeof(char) // Lenet Mnist model number of output classes. #define LENET_MNIST_OUTPUT_CLASS 10 // Allocate buffer for input data. This buffer contains the input image // pre-processed and serialized as text to include here. uint8_t imageData[LENET_MNIST_INPUT_SIZE] = { #include "clothes_select.inc" }; /* Tresholds */ #define DETECTION_TRESHOLD 60 /*! * @brief Initialize parameters for inference * * @param reference to flat buffer * @param reference to interpreter * @param pointer to storing input tensor address * @param verbose mode flag. Set true for verbose mode */ void InferenceInit(std::unique_ptr<tflite::FlatBufferModel> &model, std::unique_ptr<tflite::Interpreter> &interpreter, TfLiteTensor** input_tensor, bool isVerbose) { model = tflite::FlatBufferModel::BuildFromBuffer(Fashion_MNIST_model, Fashion_MNIST_model_len); if (!model) { LOG(FATAL) << "Failed to load model\r\n"; return; } tflite::ops::builtin::BuiltinOpResolver resolver; tflite::InterpreterBuilder(*model, resolver)(&interpreter); if (!interpreter) { LOG(FATAL) << "Failed to construct interpreter\r\n"; return; } int input = interpreter->inputs()[0]; const std::vector<int> inputs = interpreter->inputs(); const std::vector<int> outputs = interpreter->outputs(); if (interpreter->AllocateTensors() != kTfLiteOk) { LOG(FATAL) << "Failed to allocate tensors!"; return; } /* Get input dimension from the input tensor metadata assuming one input only */ *input_tensor = interpreter->tensor(input); auto data_type = (*input_tensor)->type; if (isVerbose) { const std::vector<int> inputs = interpreter->inputs(); const std::vector<int> outputs = interpreter->outputs(); LOG(INFO) << "input: " << inputs[0] << "\r\n"; LOG(INFO) << "number of inputs: " << inputs.size() << "\r\n"; LOG(INFO) << "number of outputs: " << outputs.size() << "\r\n"; LOG(INFO) << "tensors size: " << interpreter->tensors_size() << "\r\n"; LOG(INFO) << "nodes size: " << interpreter->nodes_size() << "\r\n"; LOG(INFO) << "inputs: " << interpreter->inputs().size() << "\r\n"; LOG(INFO) << "input(0) name: " << interpreter->GetInputName(0) << "\r\n"; int t_size = interpreter->tensors_size(); for (int i = 0; i < t_size; i++) { if (interpreter->tensor(i)->name) { LOG(INFO) << i << ": " << interpreter->tensor(i)->name << ", " << interpreter->tensor(i)->bytes << ", " << interpreter->tensor(i)->type << ", " << interpreter->tensor(i)->params.scale << ", " << interpreter->tensor(i)->params.zero_point << "\r\n"; } } LOG(INFO) << "\r\n"; } } /*! * @brief Runs inference input buffer and print result to console * * @param pointer to image data * @param image data length * @param pointer to labels string array * @param reference to flat buffer model * @param reference to interpreter * @param pointer to input tensor */ void RunInference(const uint8_t* image, size_t image_len, const std::string* labels, std::unique_ptr<tflite::FlatBufferModel> &model, std::unique_ptr<tflite::Interpreter> &interpreter, TfLiteTensor* input_tensor) { /* Copy image to tensor. */ memcpy(input_tensor->data.uint8, image, image_len); /* Do inference on static image in first loop. */ auto start = GetTimeInUS(); if (interpreter->Invoke() != kTfLiteOk) { LOG(FATAL) << "Failed to invoke tflite!\r\n"; return; } auto end = GetTimeInUS(); const float threshold = (float)DETECTION_TRESHOLD /100; std::vector<std::pair<float, int>> top_results; int output = interpreter->outputs()[0]; TfLiteTensor *output_tensor = interpreter->tensor(output); TfLiteIntArray* output_dims = output_tensor->dims; // assume output dims to be something like (1, 1, ... , size) auto output_size = output_dims->data[output_dims->size - 1]; /* Find best image candidates. */ GetTopN<uint8_t>(interpreter->typed_output_tensor<uint8_t>(0), output_size, 1, threshold, &top_results, false); if (!top_results.empty()) { auto result = top_results.front(); const float confidence = result.first; const int index = result.second; if (confidence * 100 > DETECTION_TRESHOLD) { LOG(INFO) << "----------------------------------------\r\n"; LOG(INFO) << " Inference time: " << (end - start) / 1000 << " ms\r\n"; LOG(INFO) << " Detected: " << std::setw(10) << labels[index] << " (" << (int)(confidence * 100) << "%)\r\n"; LOG(INFO) << "----------------------------------------\r\n\r\n"; } } } /*! * @brief Main function */ int main(void) { const std::string labels[] = {"T-shirt/top", "Trouser","Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"}; /* Init board hardware. */ BOARD_ConfigMPU(); BOARD_InitPins(); BOARD_BootClockRUN(); BOARD_InitDebugConsole(); InitTimer(); std::unique_ptr<tflite::FlatBufferModel> model; std::unique_ptr<tflite::Interpreter> interpreter; TfLiteTensor* input_tensor = 0; InferenceInit(model, interpreter, &input_tensor, false); LOG(INFO) << "Fashion MNIST object recognition example using a TensorFlow Lite model.\r\n"; LOG(INFO) << "Detection threshold: " << DETECTION_TRESHOLD << "%\r\n"; /* Run inference on static ship image. */ LOG(INFO) << "\r\nStatic data processing:\r\n"; RunInference((uint8_t*)imageData, (size_t)LENET_MNIST_INPUT_SIZE, labels, model, interpreter, input_tensor); while(1) {} } Testing result After deploying the model in the demo project, then we'll run this demo on the MIMXRT1060 (Fig 12) board for testing. Fig 12 Run the below code to covert the Fashion MNIST image to text The process_image() function can convert a Fashion MNIST image to an include file as static data, then include this file in the demo project. def process_image(image, output_path, num_batch=1): img_data = np.transpose(image, (2, 0, 1)) # Repeat image for batch processing (resulting tensor is NCHW or NHWC) img_data = np.reshape(img_data, (num_batch, img_data.shape[0], img_data.shape[1], img_data.shape[2])) img_data = np.repeat(img_data, num_batch, axis=0) img_data = np.reshape(img_data, (num_batch, img_data.shape[1], img_data.shape[2], img_data.shape[3])) # Serialize image batch img_data_bytes = bytearray(img_data.tobytes(order='C')) image_bytes_per_line = 20 with open(output_path, 'wt') as f: idx = 0 for byte in img_data_bytes: f.write('0X%02X, ' % byte) if idx % image_bytes_per_line == (image_bytes_per_line - 1): f.write('\n') idx = idx + 1 # Return serialized image size return len(img_data_bytes) 2. Run the demo project on board.

Introduction A common need for GUI applications is to implement a clock function. Whether it be to create a clock interface for the end user's benefit, or just to time animations or other actions, implementing an accurate clock is a useful and important feature for GUI applications. The aim of this document is to help you implement clock functions in your AppWizard project. Methods When implementing a real-time clock, there are a couple of general methods to do so. Use an independent timer in your MCU Using animation objects Each of these methods have their advantages and disadvantages. If you just need a timer that doesn't require extra code and you don't require control or assurance of precision, or maybe you can't spare another timer, using an animation object (method #2) may be a good option in that application. If your application requires an assurance of precision or requires other real-time actions to be performed that AppWizard can't control, it is best to implement an independent timer in your MCU (method #1). Method 1: Independent MCU Timer Implementing a timer via an independent MCU timer allows better control and guarantees the precision because it isn't a shared clock and the developer can adjust the interrupt priorities such that the timer interrupt has the highest priority. AppWizard timing uses a common timer and then time slices activities similar to how an operating system works. It is for this reason that implementing an independent MCU timer is best when you need control over the precision of the timer or you need other real-time actions to be triggered by this timer. When implementing a timer using an independent MCU timer (like the RTC module), an understanding of how to interact with Text widgets is needed. Let's look at this first. Interacting with Text Widgets Editing Text widgets occurs through the use of the emWin library API (the emWin library is the underlying code that AppWizard builds upon). The Text widget API functions are documented in the emWin Graphic Library User Guide and Reference Manual, UM3001. Most of the Text widget API functions require a Text widget handle. Be sure to not confuse this handle for the AppWizard ID. Imagine a clock example where there are two Text widgets in the interface: one for the minutes and one for the seconds. The AppWizard IDs of these objects might be ID_TEXT_MINS and ID_TEXT_SECONDS respectively (again, these are not to be confused with the handle to the Text widget for use by emWin library functions). The first action software should take is to obtain the handle for the Text widgets. This can be done using the WM_GetDialogItem function. The code to get the active window handle and the handle for the two Text widgets is shown below: activeWin = WM_GetActiveWindow(); textBoxMins = WM_GetDialogItem(activeWin, ID_TEXT_MINS); textBoxSecs = WM_GetDialogItem(activeWin, ID_TEXT_SECONDS);‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Note that this function requires the handle to the parent window of the Text widget. If your application has multiple windows or screens, you may need to be creative in how you acquire this handle, but for this example, the software can simply call the WM_GetActiveWindow function (since there is only one screen). When to call these functions can be a bit tricky as well. They can be called before the MainTask() function of the application is called and the application will not crash. However, the handles won't be correct and the Text widgets will not be updated as expected. It's recommended that these handles be initialized when the screen is initialized. An example of how this would be done is shown below: void cbID_SCREEN_CLOCK(WM_MESSAGE * pMsg) { extern WM_HWIN activeWin; extern WM_HWIN textBoxMins; extern WM_HWIN textBoxSecs; extern WM_HWIN textBoxDbg; if(pMsg->MsgId == WM_INIT_DIALOG) { activeWin = WM_GetActiveWindow(); textBoxMins = WM_GetDialogItem(activeWin, ID_TEXT_MINS); textBoxSecs = WM_GetDialogItem(activeWin, ID_TEXT_SECONDS); textBoxDbg = WM_GetDialogItem(activeWin, ID_TEXT_DBG); } GUI_USE_PARA(pMsg); }‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Once the Text widget handles have been acquired, the text can be updated using the TEXT_SetText() function or the TEXT_SetDec() function in this case, because the Text widgets are configured for decimal mode, since we want to display numbers. An example of the code to do this is shown below. /* TEXT_SetDec(Text Widget Handle, Value as Int, Length, Shift, Sign, Leading Spaces) */ if(TEXT_SetDec(textBoxSecs, (int)gSecs, 2, 0, 0, 0)) { /* Perform action here if necessary */ } if(TEXT_SetDec(textBoxMins, (int)gMins, 2, 0, 0, 0)) { /* Perform action here if necessary */ } ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ Method 2: Animation Objects When implementing a real-time clock using animation objects, it is necessary to implement a loop. This could be done outside of the AppWizard GUI (in your code) but because the timing precision can't be guaranteed, it's just as easy to implement a loop in the AppWizard GUI if you know how (it isn't very intuitive as to how to do this). Before examining the interactions to do this, let's look at the variables and objects needed to do this. ID_VAR_SECS - This variable holds the current seconds value. ID_VAR_SECS_1 - This variable holds the next second value. ID_TEXT_SECONDS - Text box that displays the current seconds value. ID_END_CNT - Variable that holds the value at which the seconds rolls over and increments the minute count ID_TEXT_MINS - Text box that holds the current minute count. ID_MIN_END_CNT - Variable that holds the value at which the minutes rolls over (which would also increment the hour count if the hours were implemented). ID_BUTTON_SECS - This is a hidden button that initiates actions when the seconds variable has reached the end count. Now, here are the interactions used to implement the clock feature using animation interactions. The heart of the loop are the interactions triggered by ID_VAR_SECS. ID_VAR_SECS -> ID_VAR_SECS_1: When ID_VAR_SECS changes, it needs to add one to ID_VAR_SECS_1 so that the animation will animate to one second from the current time. ID_VAR_SECS -> ID_TEXT_SECONDS: When ID_VAR_SECS changes, it also needs to start the animation from the current value to the next second (ID_VAR_SECS_1). A very essential part of the loop is ensuring the animation restarts every time. So ID_TEXT_SECONDS needs to change the value of ID_VAR_SECS when the animation ends. ID_VAR_SECS is changed to the current time value, ID_VAR_SECS_1. When the ID_TEXT_SECONDS animation ends, it must also decrement the ID_VAR_END_CNT variable. This is analogous to the control variable of a "For" loop being updated. This is done using the ADDVALUE job, adding '-1' to the variable, ID_VAR_END_CNT. When ID_VAR_END_CNT changes, it updates the hidden button, ID_BUTTON_SECS, with the new value. This is analogous to a "For" loop checking whether its control variable is still within its limits. The interactions in group 5 are interactions that restart the loop when the seconds reach the count that we desire. When the loop is restarted, the following actions must be taken: Set ID_VAR_SECS and ID_VAR_SECS_1 to the initial value for the next loop ('0' in this case). Note that ID_VAR_SECS_1 MUST be set before ID_VAR_SECS. Additionally, if the loop is to continue, ID_VAR_SECS and ID_VAR_SECS_1 must be set to the same value. ID_TEXT_SECONDS is set to the initial value. If this isn't done, then the text box will try to animate from the final value to the initial value and then will look "weird". ID_VAR_END_CNT is reset to its initial value (60 in this case). ID_BUTTON_SECS is also responsible for updating the minutes values. In this case, it's incrementing the ID_TEXT_MINS value (counting up in minutes) and decrementing the ID_VAR_MIN_END_CNT Adjusting the time of an animation object The animation object (as well as other emWin objects) use the GUI_X_DELAY function for timing. It is up to the host software to implement this function. In the i.MX RT examples, the General Purpose Timer (GPT) is used for this timer. So how the GPT is configured will affect the timing of the application and the how fast or slow the animations run. The GPT is configured in the function BOARD_InitGPT() which resides in the main source file. The recommended way to adjust the speed of the timer is by changing the divider value to the GPT. Conclusion So we have seen two different methods of implementing a real-time clock in an AppWizard GUI application. Those methods are: Use an independent timer in your MCU Using animation objects Using an independent timer in your MCU may be preferred as it allows for better control over the timing, can allow for real-time actions to be performed that AppWizard can't control, and provides some assurance of precision. Using animation objects may be preferred if you just need a quick timer implementation that doesn't require you to manually add code to your project or use a second timer.

Overview of i.MX RT1050 The i.MX RT1050 is the industry's first crossover processor and combines the high-performance and high level of integration on an applications processors with the ease of use and real-time functionality of a micro-controller. The i.MX RT1050 runs on the Arm Cortex-M7 core at 600 MHz, it means that it definitely has the ability to do some complicated computing, such as floating-point arithmetic, matrix operation, etc. For general MCU, they're hard to conquer these complicated operations. It has a rich peripheral which makes it suit for a variety of applications, in this demo, the PXP (Pixel Pipeline), CSI (CMOS Sensor Interface), eLCDIF (Enhanced LCD Interface) allows me to build up camera display system easily Fig 1 i.MX RT series It has a rich peripheral which makes it suit for a variety of applications, in this demo, the PXP (Pixel Pipeline), CSI (CMOS Sensor Interface), eLCDIF (Enhanced LCD Interface) allows me to build up camera display system easily Fig 2 i.MX RT1050 Block Diagram Basic concept of Compute Vision (CV) Machine Learning (ML) is moving to the edge because of a variety of reasons, such as bandwidth constraint, latency, reliability, security, ect. People want to have edge computing capability on embedded devices to provide more advanced services, like voice recognition for smart speakers and face detection for surveillance cameras. Fig 3 Reason Convolutional Neural Networks (CNNs) is one of the main ways to do image recognition and image classification. CNNs use a variation of multilayer perception that requires minimal pre-processing, based on their shared-weights architecture and translation invariance characteristics. Fig 4 Structure of a typical deep neural network Above is an example that shows the original image input on the left-hand side and how it progresses through each layer to calculate the probability on the right-hand side. Hardware MIMXRT1050 EVK Board; RK043FN02H-CT(LCD Panel) Fig 5 MIMXRT1050 EVK board Reference demo code emwin_temperature_control: demonstrates graphical widgets of the emWin library. cmsis_nn_cifar10: demonstrates a convolutional neural network (CNN) example with the use of convolution, ReLU activation, pooling and fully-connected functions from the CMSIS-NN software library. The CNN used in this example is based on the CIFAR-10 example from Caffe. The neural network consists of 3 convolution layers interspersed by ReLU activation and max-pooling layers, followed by a fully-connected layer at the end. The input to the network is a 32x32 pixel color image, which is classified into one of the 10 output classes. Note: Both of these two demo projects are from the SDK library Deploy the neuro network mode Fig 6 illustrates the steps of deploying the neuro network mode on the embedded platform. In the cmsis_nn_cifar10 demo project, it has provided the quantized parameters for the 3 convolution layer, so in this implementation, I use these parameters directly, BTW, I choose 100 images randomly from the Test set as a round of input to evaluate the accuracy of this model. And through several rounds of testing, I get the model's accuracy is about 65% as the below figure shows. Fig 6 Deploy the neuro network mode Fig 7 cmsis_nn_cifar10 demo project test result The CIFAR-10 dataset is a collection of images that are commonly used to train ML and computer vision algorithms, it consists of 60000 32x32 color images in 10 classes, with 6000 images per class ("airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"). There are 50000 training images and 10000 test images. Embedded platform software structure After POR, various components are initialized, like system clock, pin mux, camera, CSI, PXP, LCD and emWin, etc. Then control GUI will show up in the LCD, press the Play button will display the camera video in the LCD, once an object into the camera's window, you can press the Capture button to pause the display and run the model to identify the object. Fig8 presents the software structure of this demo. Fig 8 Embedded platform software structure Object identify Test The three figures present the testing result. Fig 9 Fig 10 Fig 11 Furture work Use the Pytorch framework to train a better and more complicated convolutional network for object recognition usage.

Source code: https://github.com/JayHeng/NXP-MCUBootUtility 【v2.0.0】 Features: > 1. Support i.MXRT5xx A0, i.MXRT6xx A0 > 支持i.MXRT5xx A0, i.MXRT6xx A0 > 2. Support i.MXRT1011, i.MXRT117x A0 > 支持i.MXRT1011, i.MXRT117x A0 > 3. [RTyyyy] Support OTFAD encryption secure boot case (SNVS Key, User Key) > [RTyyyy] 支持基于OTFAD实现的安全加密启动（唯一SNVS key，用户自定义key） > 4. [RTxxx] Support both UART and USB-HID ISP modes > [RTxxx] 支持UART和USB-HID两种串行编程方式（COM端口/USB设备自动识别） > 5. [RTxxx] Support for converting bare image into bootable image > [RTxxx] 支持将裸源image文件自动转换成i.MXRT能启动的Bootable image > 6. [RTxxx] Original image can be a bootable image (with FDCB) > [RTxxx] 用户输入的源程序文件可以包含i.MXRT启动头 (FDCB) > 7. [RTxxx] Support for loading bootable image into FlexSPI/QuadSPI NOR boot device > [RTxxx] 支持下载Bootable image进主动启动设备 - FlexSPI/QuadSPI NOR接口Flash > 8. [RTxxx] Support development boot case (Unsigned, CRC) > [RTxxx] 支持用于开发阶段的非安全加密启动（未签名，CRC校验） > 9. Add Execute action support for Flash Programmer > 在通用Flash编程器模式下增加执行(跳转)操作 > 10. [RTyyyy] Can show FlexRAM info in device status > [RTyyyy] 支持在device status里显示当前FlexRAM配置情况 Improvements: > 1. [RTyyyy] Improve stability of USB connection of i.MXRT105x board > [RTyyyy] 提高i.MXRT105x目标板USB连接稳定性 > 2. Can write/read RAM via Flash Programmer > 通用Flash编程器里也支持读写RAM > 3. [RTyyyy] Provide Flashloader resident option to adapt to different FlexRAM configurations > [RTyyyy] 提供Flashloader执行空间选项以适应不同的FlexRAM配置 Bugfixes: > 1. [RTyyyy] Sometimes tool will report error "xx.bat file cannot be found" > [RTyyyy] 有时候生成证书时会提示bat文件无法找到，导致证书无法生成 > 2. [RTyyyy] Editing mixed eFuse fields is not working as expected > [RTyyyy] 可视化方式去编辑混合eFuse区域并没有生效 > 3. [RTyyyy] Cannot support 32MB or larger LPSPI NOR/EEPROM device > [RTyyyy] 无法支持32MB及以上容量的LPSPI NOR/EEPROM设备 > 4. Cannot erase/read the last two pages of boot device via Flash Programmer > 在通用Flash编程器模式下无法擦除/读取外部启动设备的最后两个Page

Introduction This document is an extension of section 3.1.3, “Software implementation” from the application note AN12077, using the i.MX RT FlexRAM. It's important that before continue reading this document, you read this application note carefully. Link to the application note. Section 3.1.3 of the application note explains how to reallocate the FlexRAM through software within the startup code of your application. This document will go into further detail on all the implications of making these modifications and what is the best way to do it. Prerequisites RT10xx-EVK The latest SDK which you can download from the following link: Welcome | MCUXpresso SDK Builder MCUXpresso IDE Internal SRAM The amount of internal SRAM varies depending on the RT. In some cases, not all the internal SRAM can be reallocated with the FlexRAM. RT Internal SRAM FlexRAM RT1010 Up to 128 KB Up to 128 KB RT1015 Up to 128 KB Up to 128 KB RT1020 Up to 256 KB Up to 256 KB RT1050 Up to 512 KB Up to 512 KB RT1060 Up to 1MB Up to 512 KB RT1064 Up to 1MB Up to 512 KB In the case of the RT106x, only 512 KB out of the 1MB of internal SRAM can be reallocated through the FlexRAM as DTCM, ITCM, and OCRAM. The remaining 512 KB are from OCRAM and cannot be reallocated. For all the other RT10xx you can reallocate the whole internal SRAM either as DTCM, ITCM, and OCRAM. Section 3.1.3.1 of the application note explains the limitations of the size when reallocating the FlexRAM. One thing that's important to mention is that the ROM bootloader in all the RT10xx parts uses the OCRAM, hence you should keep some OCRAM when reallocating the FlexRAM, this doesn't apply to the RT106x since you will always have the 512 KB of OCRAM that cannot be reallocated. To know more about how many OCRAM each RT family needs please refer to section 2.1.1.1 of the application note. Implementation in MCUXpresso IDE First, you need to import any of the SDK examples into your MCUXpresso IDE workspace. In my case, I imported the igpio_led_output example for the RT1050-EVKB. If you compile this project, you will see that the default configuration for the FlexRAM on the RT1050-EVKB is the following: SRAM_DTC 128 KB SRAM_ITC 128 KB SRAM_OC 256 KB Now we need to go to the Reset handler located in the file startup_mimxrt1052.c. Reallocating the FlexRAM has to be done before the FlexRAM is configured, this is why it's done inside the Reset Handler. The registers that we need to modify to reallocate the FlexRAM are IOMUXC_GPR_GPR16, and IOMUXC_GPR_GPR17. So first we need to have in hand the addresses of these three registers. Register Address IOMUXC_GPR_GPR16 0x400AC040 IOMUXC_GPR_GPR17 0x400AC044 Now, we need to determine how we want to reallocate the FlexRAM to see the value that we need to load into register IOMUXC_GPR_GPR17. In my case, I want to have the following configuration: SRAM_DTC 256 KB SRAM_ITC 128 KB SRAM_OC 128 KB When choosing the new sizes of the FlexRAM be sure that you choose a configuration that you can also apply through the FlexRAM fuses, I will explain the reason for this later. The configurations that you can achieve through the fuses are shown in the Fusemap chapter of the reference manual in the table named "Fusemap Descriptions", the fuse name is "Default_FlexRAM_Part". Based on the following explanation of the IOMUXC_GPR_GPR17 register: The value that I need to load to the register is 0xAAAAFF55. Where the first 4 banks correspond to the 128KB of SRAM_OC, the next 4 banks correspond to the 128KB of SRAM_ITC and the last 8 banks are the 256KB of SRAM_DTC. Now, that we have all the addresses and the values that we need we can start writing the code in the Reset handler. The first thing to do is load the new value into the register IOMUXC_GPR_GPR17. After, we need to configure register IOMUXC_GPR_GPR16 to specify that the FlexRAM bank configuration should be taken from register IOMUXC_GPR_GPR17 instead of the fuses. Then if in your new configuration of the FlexRAM either the SRAM_DTC or SRAM_ITC are of size 0, you need to disable these memories in the register IOMUXC_GPR_GPR16. At the end your code should look like the following: void ResetISR(void) { // Disable interrupts __asm volatile ("cpsid i"); /* Reallocating the FlexRAM */ __asm (".syntax unified\n" "LDR R0, =0x400ac044\n"//Address of register IOMUXC_GPR_GPR17 "LDR R1, =0xaaaaff55\n"//FlexRAM configuration DTC = 265KB, ITC = 128KB, OC = 128KB "STR R1,[R0]\n" "LDR R0,=0x400ac040\n"//Address of register IOMUXC_GPR_GPR16 "LDR R1,[R0]\n" "ORR R1,R1,#4\n"//The 4 corresponds to setting the FLEXRAM_BANK_CFG_SEL bit in register IOMUXC_GPR_GPR16 "STR R1,[R0]\n" #ifdef FLEXRAM_ITCM_ZERO_SIZE "LDR R0,=0x400ac040\n"//Address of register IOMUXC_GPR_GPR16 "LDR R1,[R0]\n" "AND R1,R1,#0xfffffffe\n"//Disabling SRAM_ITC in register IOMUXC_GPR_GPR16 "STR R1,[R0]\n" #endif #ifdef FLEXRAM_DTCM_ZERO_SIZE "LDR R0,=0x400ac040\n"//Address of register IOMUXC_GPR_GPR16 "LDR R1,[R0]\n" "AND R1,R1,#0xfffffffd\n"//Disabling SRAM_DTC in register IOMUXC_GPR_GPR16 "STR R1,[R0]\n" #endif ".syntax divided\n"); #if defined (__USE_CMSIS) // If __USE_CMSIS defined, then call CMSIS SystemInit code SystemInit(); ...‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ If you compile your project you will see the memory distribution that appears on the console is still the default configuration. This is because we did modify the Reset handler to reallocate the FlexRAM but we haven't modified the linker file to match these new sizes. To do this you need to go to the properties of your project. Once in the properties, you need to go to C/C++ Build -> MCU settings. Once you are in the MCU settings you need to modify the sizes of the SRAM memories to match the new configuration. When you make these changes click Apply and Close. After making these changes if you compile the project you will see the memory distribution that appears in the console is now matching the new sizes. Now we need to modify the Memory Protection Unit (MPU) to match these new sizes of the memories. To do this you need to go to the function BOARD_ConfigMPU inside the file board.c. Inside this function, you need to locate regions 5, 6, and 7 which correspond to SRAM_ITC, SRAM_DTC, and SRAM_OC respectively. Same as for register IOMUXC_GPR_GPR14, if the new size of your memory is not 32, 64, 128, 256, or 512 you need to choose the next greater number. Your configuration should look like the following: /* Region 5 setting: Memory with Normal type, not shareable, outer/inner write back */ MPU->RBAR = ARM_MPU_RBAR(5, 0x00000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_128KB); /* Region 6 setting: Memory with Normal type, not shareable, outer/inner write back */ MPU->RBAR = ARM_MPU_RBAR(6, 0x20000000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_256KB); /* Region 7 setting: Memory with Normal type, not shareable, outer/inner write back */ MPU->RBAR = ARM_MPU_RBAR(7, 0x20200000U); MPU->RASR = ARM_MPU_RASR(0, ARM_MPU_AP_FULL, 0, 0, 1, 1, 0, ARM_MPU_REGION_SIZE_128KB);‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍ We need to change the image entry address to the Reset handler. To do this, you need to go to the file fsl_flexspi_nor_boot.c inside the xip folder. You need to declare the ResetISR and change the entry address in the image vector table. Finally, we need to place the stack at the start of the DTCM memory. To do this, we need to go to the properties of your project. From there, we have to C/C++ Build and Manage Linker Script. From there, we will need to add two more assembly instructions in our ResetISR function. We have to add these two instructions at the beginning of our assembly code: In the attached c file, you'll find all the assembly instructions mentioned above. That's it, these are all the changes that you need to make to reallocate the FlexRAM during the startup. Debug Session To verify that all the modifications that we just did were correct we will launch the debug session. As soon as we reach the main, before running the application, we will go to the peripheral view to see registers IOMUXC_GPR_GPR16, and IOMUXC_GPR_GPR17 and verify that the values are the correct ones. In register IOMUXC_GPR_GPR16 as shown in the image below we configure the FLEXRAM_BANK_CFG_SEL as 1 to use the use register IOMUXC_GPR_GPR17 to configure the FlexRAM. Finally, in register IOMUXC_GPR_GPR17 we can see the value 0xAAAAFF55 that corresponds to the new configuration. Reallocating the FlexRAM through the Fuses We just saw how to reallocate the FlexRAM through software by writing some code in the Reset Handler. This procedure works fine, however, it's recommended that you use this approach to test the different sizes that you can configure but once you find the correct configuration for your application we highly recommend that you configure these new sizes through the fuses instead of using the register IOMUXC_GPR_GPR17. There are lots of dangerous areas in reconfiguring the FlexRAM in code. It pretty much all boils down to the fact that any code/data/stack information written to the RAM can end up changing location during the reallocation. This is the reason why once you find the correct configuration, you should apply it through the fuses. If you use the fuses to configure the FlexRAM, then you don't have the same concerns about moving around code and data, as the fuse settings are applied as a hardware default. Keep in mind that once you burn the fuses there's no way back! This is why it's important that you first try the configuration through the software method. Once you burn the fuses you won't need to modify the Reset handler, you only need to modify the MPU to change the size of regions as we saw before and the MCU settings of your project to match the new memory sizes that you configured through the fuses. The fuse in charge of the FlexRAM configuration is Default_FlexRAM_Part, the address of this fuse is 0x6D0[15:13]. You can find more information about this fuse and the different configurations in the Fusemap chapter of the reference manual. To burn the fuses I recommend using either the blhost or the MCUBootUtility. Link to download the blhost. Link to the MCUBootUtility webpage. I hope you find this document helpful! Víctor Jiménez

MCUXPRESSO SECURE PROVISIONING TOOL是官方今年上半年推出的一个针对安全的软件工具，操作起来非常的简单便捷而且稳定可靠，对于安全功能不熟悉的用户十分友好。但就是目前功能还不是很完善，只能支持HAB的相关操作，后续像BEE之类的需等待更新。详细的介绍信息以及用户手册请参考官方网址：MCUXpresso Secure Provisioning Tool | Software Development for NXP Microcontrollers (MCUs) | NXP | NXP 目前似乎知道这个工具的客户还不是很多，大部分用的更多的还是MCU BOOT UTILITY。那么如果已经用了MCU BOOT UTILITY烧录了FUSE，现在想用官方工具了怎么办了？其实对两者进行研究对比后，他们最原始的执行部分都是一样的，所以我们按照如下步骤进行相应的简单替换就能把新工具用起来：首先是crts可keys的替换， MCU BOOT UTILITY的路径是在： ..\NXP-MCUBootUtility-2.2.0\NXP-MCUBootUtility-2.2.0\tools\cst MCUXPRESSO SECURE PROVISIONING的对应路径是在对应workspace的根目录：另外还有一个就是encrypted模式会用到的hab_cert，需要将下面这两个文件对应替换，而且两个工具的命名不同，注意修改。 MCU BOOT UTILITY的路径是在： ..\NXP-MCUBootUtility-2.2.0\NXP-MCUBootUtility-2.2.0\gen\hab_cert MCUXPRESSO SECURE PROVISIONING的路径是workspace里： ..\secure_provisioning_RT1050\gen_hab_certs MCU BOOT UTILITY里命名为：SRK_1_2_3_4_table.bin; SRK_1_2_3_4_fuse.bin MCUXPRESSO SECURE PROVISIONING里命名为：SRK_fuses.bin; SRK_hash.bin 至此，就能够在新工具上用起来了最后提一下，就是这个新工具是可以建不同的workspace来相应存储不同秘钥的项目，能够方便用户区分。在新工具下建的项目也是可以互相替换秘钥的，参考上术步骤中的secure provisioning部分即可。

i.MX RT Knowledge Base

i.MX RT Knowledge Base

Discussions

How to develop a smart coffee machine based on SLN-VIZNAS-IOT

How to enable DDR mode

RT10xx SAI basic and SDCard wave file play

一个基于i.MXRT1050的针对计算机视觉（CV）应用的IoT边缘节点的设计

使用恩智浦微控制器轻松开发GUI

采用TensorFlow Lite模型在RT1060上实现手写体数字识别

在i.MX RT1064 EVK上运行elQ

RT1050 HAB加密映像文件的生成和分析

How to use i.MX RT1060 to classify the clothing

AppWizard: Implementing a Real-Time Clock

NXP Tech Session - Implementing Graphics in Real-Time Industrial HMI Systems with i.MX RT10xx

Design an IoT edge node for CV application base on the i.MXRT1050

One-stop secure boot tool: NXP-MCUBootUtility v2.0.0 supports RT1011/RT117x/RT5xx/RT6xx

一站式安全启动工具：NXP-MCUBootUtility v1.0.0 发布

Reallocating the FlexRAM

你是否有一分钟来了解一下i.MX RT

NXP Tech Session - Secure Access, Authorization and User Customization Using MCU-based Face Recognition

RT1060 的普通 GPIO 和快速 GPIO的对比

Application Note (AN12562): Development of H.264 video decode on RT series

MCU BOOT UTILITY和MCUXPRESSO SECURE PROVISIONING之间的秘钥替换