RT106L/S voice control system based on the Baidu cloud

Showing results for 
Search instead for 
Did you mean: 

RT106L/S voice control system based on the Baidu cloud

No ratings

RT106L/S voice control system based on the Baidu cloud

RT106L_S voice control system based on the Baidu cloud

1 Introduction

    The NXP RT106L and RT106S are voice recognition chip which is used for offline local voice control, SLN-LOCAL-IOT is based on RT106L, SLN-LOCAL2-IOT is a new local speech recognition board based on RT106S. The board includes the murata 1DX wifi/BLE module, the AFE voice analog front end, the ASR recognition system, the external flash, 2 microphones, and the analog voice amplifier and speakers. The voice recognition process for SLN-LOCAL-IOT and SLN-LOCAL2-IOT is different and the new SLN-LOCAL2-IOT is recommended.

    This article is based on the voice control board SLN-LOCAL/2-IOT to implement the following block diagram functions:


Pic 1

Use the PC-side speed model tool (Cyberon DSMT) to generate WW(wake word) and VC(voice command) Command related voice engine binary files , which will be used by the demo code. This system is mainly used for the Chinese word recognition, when the user says Chinese word: "小恩小恩", it wakes up SLN-LOCAL/2-IOT, and the board gives feedback "小恩来了,请吩咐". Then system enter the voice recognition stage, the user can say the voice recognition command: “开红灯”,“关红灯”,“开绿灯”,“关绿灯”,“灯闪烁”,“开远程灯”,“关远程灯”, after recognition, the board gives feedback "好的". Among them, “开红灯”,“关红灯”,“开绿灯”,“关绿灯”,“灯闪烁”,the five commands are used for the local light switch, while the 开远程灯”,“关远程灯“two commands can through network communication Baidu cloud control the additional MIMXRT1060-EVK development board light switch. SLN-LOCAL/2-IOT through the WIFI module access to the Internet with MQTT protocol to achieve communication with Baidu cloud, when dectect the remote control command, publish the json packets to Baidu cloud, while MIMRT1060-EVK subscribe Baidu cloud data, will receive data from the IOT board and analyze the EVK board led control. PC side can use MQTT.fx software to subscribe the Baidu cloud data, it also can send data to the device to achieve remote control function directly.

 Now, will give the detail content about how to use the SLN-LOCAL/2-IOT SDK demo realize the customized Chinese wake command and voice command, and remote control the MIMXRT1060-EVK through the Baidu Cloud.


2 Platform establish

2.1 Used platform






Segger JLINK

Baidu Smart Cloud: Baidu cloud control+ TTS

Audacity:audio file format convert tool

WAVToCode:wav convert to the c array code, which used for the demo tilte play

MCUBootUtility: used to burn the feedback audio file to the filesystem

Cyberon DSMT: wake word and voice detect command generation tool

DSMT is the very important tool to realize the wake word and voice dection, the apply follow is:


Pic 2

2.2 Baidu Smart cloud

2.2.1 Baidu cloud IOT control system

Enter the IoT Hub:


    Click used now. Create device project

Create a project, select the device type, and enter the project name. Device types can use shadows as images of devices in the cloud to see directly how data is changing. Once created, an endpoint is generated, along with the corresponding address:


Pic 3 Create Thing model

The Thing model is mainly to establish various properties needed in the shadow, such as temperature, humidity, other variables, and the type of value given, in fact, it is also the json item in the actual MQTT communication.

   Click the newly created device-type project where you can create a new thing model or shadow:


Pic 4

   Here create 3 attributes:LEDstatus,humid,temp

It is used to represent the led status, humidity, temperature and so on, which is convenient for communication and control between the cloud and RT board. Once created, you get the following picture:



Pic 5 Create Thing shadow

In the device-type project, you can select the shadow, build your own shadow platform, enter the name, and select the object model as the newly created Thing model containing three properties, after the create, we can get the details of the shadow:


Pic 6

At the same time will also generate the shadow-related address, names and keys, my test platform situation is as follows:

TCP Address: tcp://rndrjc9.mqtt.iot.gz.baidubce.com:1883

SSL Address: ssl://rndrjc9.mqtt.iot.gz.baidubce.com:1884

WSS Address: wss://rndrjc9.mqtt.iot.gz.baidubce.com:443

name: rndrjc9/RT1060BTCDShadow

key: y92ewvgjz23nzhgn

Port 1883, does not support transmission data encryption

Port 1884, supports SSL/TLS encrypted transmission

Port 8884, which supports wesockets-style connections, also contains SSL encryption.

This article uses a 1883 port with no transmission data encryption for easy testing.

So far, Baidu cloud device-type cloud shadow has been completed, the following can use MQTTfx tools to connect and test. In practice, it is recommended that customers build their own Baidu cloud connection, the above user key is for reference only.


2.2.2 Online TTS

   SLN-LOCAL/2-IOT board recognizes wake-up words, recognition words, or when powering on, you need to add corresponding demo audio, such as: "百度云端语音测试demo ", "小恩来啦!请吩咐“,"好的". These words need to do a text-to-wav audio file synthesis, here is Baidu Smart Cloud's online TTS function, the specific operation can refer to the following documents:


  Once the base audio library is opened, use the main.py provided in the link above and modify it to add the Chinese field you want to convert to the file "TEXT" and add the audio file to be converted in "save_file" such as xxx .wav, using the command: python main.py to complete the conversion, and generate the audio format corresponding to the text, such as .mp3, .wav.


Pic 7


After getting the wav file, it can’t be used directly, we need to note that for SLN-LOCAL/2-IOT board, you need to identify the audio source of the 48K sample rate with 16bit, so we need to use the Audioacity Audio tool to convert the audio file format to 48K16bit wav. Import 16K16bit wav files generated by Baidu TTS into the Audioacity tool, select project rate of 48Khz, file->export->export as WAV, select encoding as signed 16bit PCM, and regenerate 48Khz16bit wav for use.


Pic 8

“百度云端语音测试demo“:Used for power-on broadcasting, demo name broadcasting, it is stored in RT demo code, so you need to convert it to a 16bit C code array and add it to the project.

"小恩来啦!请吩咐",“好的“:voice detect feedback, it is saved in the filesystem ZH01,ZH02 area.

2.3 playback audio data prepare and burn

  There are two playback audio file, it is "小恩来啦!请吩咐",“好的“,it is saved in the filesystem ZH01,ZH02 area. Filesystem memory map like this:


Pic 9

So, we need to convert the 48K16bit wav file to the filesystem needed format, we need to use the official tool::Ivaldi_sln_local2_iot

Reference document:SLN-LOCAL2-IOT-DG

chapter 10.1 Generating filesystem-compatible files

Use bash input the commands like the following picture:



Use the convert command to get the playback bin file:

python file_format.py -if xiaoencoming_48k16bit.wav -of xiaoencoming_48k16bit.bin -ft H

At last, it will generate the file:

"小恩来啦!请吩咐"->xiaoencoming_48k16bit.bin,burn to flash address 0x6184_0000

“好的”->OK_48k16bit.bin, burn to flash address 0x6180_0000

Then, use MCUBootUtility tool burn the above two file to the related images.

Here, take OK_48k16bit.bin as an example, demo enter the serial download mode(J27-0), power off and power on.

Flash chip need to select hyper flash IS26KSXXS, use the boot device memory windows, write button to burn the .bin file to the related address, length is 0X40000





xiaoencoming_48k16bit.bin can use the same method to download to 0x6184_0000,Length is 0X40000.


2.4 Demo audio prepare and add

The prepared baiduclouddemo_48K16bit.wav(“百度云端语音测试demo “) need to convert to the 16bit C array code, and put to the project code, calls by the code, this is used for the demo mode play. The convert need to use the WAVToCode, the operation like this:


Pic 13

The generated baiducloulddemo_48K16bit.c,add it to the demo project C files:


2.5 WW and VC prepare

Wake-up word are generated through the cyberon DSMT tool, which supports a wide range of language, customers can request the tool through Figure 2. The Chinese wake-up words and voice command words in this article are also generated through DSMT.

DSMT can have multiple groups, group1 as a wake-up word configuration, CmdMapID s 1. Other groups act as voice command words, such as CMD-IOT in this article, cmdMapID=2.


Pic 1415.jpg


Pic 15

Wake word continuously detects the input audio stream, uses group1, and if successfully wakes up, will do the voice command detection uses group2, or other identifying groups as well as custom groups. The wake-up words using the DSMT tool, the configuration are as follows:


Pic 16

The WW can support more words, customer can add the needed one in the group 1.

Use the DSMT configure VC like this:


Pic 17

Then, save the file, code used file are: _witMapID.bin, CMD_IOT.xml,WW.xml.

In the generated files, CYBase.mod is the base model, WW.mod is the WW model,

CMD_IOT.mod is the VC model.

After Pic 16,17, it finishes the WW and VC command prepare, we can put the DSMT project to the RT106S demo project folder: sln_local2_iot_local_demo\local_voice\oob_demo_zh

3 Code prepare

Based on the official SLN-LOCAL2-IOT SDK local_demo, the code in this article modifies the Chinese wake-up words and recognition words (or you can build a new customer custom group directly), add local voice detect the led status operations, Then feedback Chinese audio, demo Chinese audio, Wifi network communication MQTT protocol code, and Baidu cloud shadow connection publish.

Source reference code SDK path:



SLN-LOCAL2-IOT and SLN-LOCAL-IOT code are nearly the same, the only difference is that the ASR library file is different, for RT106S (SLN-LOCAL2-IOT) using SDK it’s own libsln_asr.a library, for RT106L (SLN-LOCAL-IOT) need to use the corresponding libsln_asr_eval.a library.

   Importing code requires three projects: local_demo, bootloader, bootstrap. The three projects store in different spaces. See SLN-LOCAL2-IOT-DG .pdf, chapter 3.3 Device memory map

   This is the 3 chip project boot process:


Pic 18

This document is for demo testing and requires debug, so this article turns off the encryption mechanism, configures bootloader, bootstrap engineering macro definition: DISABLE_IMAGE_VERIFICATION = 1, and uses JLINK to connect SLN-LOCAL/2-IOT's SWD interface to burn code.

The following is to add modification code for app local_demo projects.

3.1 sln-local/2-iot code

Sln-local-iot, sln-local2-iot platform, the following modification are the same for the two platform.

3.1.1 Voice recognition related code

1Demo audio play

Play content:“百度云端语音测试demo“

sln_local2_iot_local_demo_xe_ledwifi\audio\demos\ smart_home.c content is replaced by the previously generated baiducloulddemo_48K16bit.C.



This code is used for the main.c announce_demo API play:

        case ASR_CMD_IOT:

            ret = demo_play_clip((uint8_t *)smart_home_demo_clip, sizeof(smart_home_demo_clip));


2command print information

#define NUMBER_OF_IOT_CMDS      7


static char *cmd_iot_en[] = {"Red led on", "Red led off", "Green led on", "Green led off",

                             "cycle led",        "remote led on",         "remote led off"};

static char *cmd_iot_zh[] = {"开红灯", "关红灯", "开绿灯", "关绿灯", "灯闪烁", "开远程灯", "关远程灯"};

Here is the source code modification using IOT, you can actually add your own speech recognition group directly, and add the relevant command identification.



Line757 , add led-related notification information in ASR_CMD_IOT mode.

oob_demo_control.ledCmd = g_asrControl.result.keywordID[1];    

The code is used to obtain the recognized VC command data, and the value of keywordID[1] represents the number. This number can let the code know which detail voice is detected. so that you can do specific things in the app based on the value of ledcmd. The value of keywordID[1] corresponds to Command List in Figure 17.

For example, “开远程灯“, if woke up, and recognized "开远程灯", then keywordID[1] is 5, and will transfer to oob_demo_control.ledCmd, which will be used in the appTask API to realize the detail control.

4) main.c

void appTask(void *arg)

Under case kCommandGeneric: if the language is Chinese, then add the recognition related control code, at first, it will play the feedback as “好的”.

Then, it will check the voice detect value, give the related local led control.

        else if (oob_demo_control.language == ASR_CHINESE)
// play audio "OK" in Chinese
#if defined(SLN_LOCAL2_RD)
                    ret = audio_play_clip((uint8_t *)AUDIO_ZH_01_FILE_ADDR, AUDIO_ZH_01_FILE_SIZE);
#elif defined(SLN_LOCAL2_IOT)
                    ret = audio_play_clip(AUDIO_ZH_01_FILE);

                    //kerry add operation code==================================================begin

                     if (oob_demo_control.ledCmd == LED_RED_ON)
                     else if (oob_demo_control.ledCmd == LED_RED_OFF)
                     else if (oob_demo_control.ledCmd == LED_BLUE_ON)
                     else if (oob_demo_control.ledCmd == LED_BLUE_OFF)
                     else if (oob_demo_control.ledCmd == CYCLE_SLOW)
                         for (int i = 0; i < 3; i++)

In addition to local voice recognition control, this article also add remote control functions, mainly through wifi connection, use the mqtt protocol to connect Baidu cloud server, when local speech recognition get the remote control command, it publish the corresponding control message to Baidu cloud, and then the cloud send the message to the client which subscribe this message,  after the client get the message, it will refer to the message content do the related control.


3.1.3 Network connection code


    Add mqtt.c


Add mqtt.h, mqtt_opts.h,mqtt_prv.h

The related mqtt driver is from the RT1060 sdk, which already added in the attachment project.


  Add MQTT application layer API function code, client ID, server host, MQTT server port number, user name, password, subscription topic, publishing topic and data, etc., more details, check the attachment code.

   The MQTT application code is ported from the mqtt project of the RT1060 SDK and added to the sln_tcp_server.c. TCP_OTA_Server function is used to initialize the wifi network, realize wifi connection, connect to the network, resolve Baidu cloud server URL to get IP, and then connect Baidu cloud server through mqtt, after the successful connection, publish the message at first, so that after power-up through mqttfx to see whether the power on network publishing message is successful.

TCP_OTA_Server function code is as follows:

static void TCP_OTA_Server(void *param) //kerry consider add mqtt related code
    err_t err      = ERR_OK;
    uint8_t status = kCommon_Failed;

    /* Start the WiFi and connect to the network */

    while (status != kCommon_Success)
        status_t statusConnect;

        statusConnect = APP_NETWORK_Wifi_Connect(true, true);
        if (WIFI_CONNECT_SUCCESS == statusConnect)
            status = kCommon_Success;
        else if (WIFI_CONNECT_NO_CRED == statusConnect)
            /* If there are no credential in flash delete the TPC server task */
            status = kCommon_Failed;

    /* Wait for wifi/eth to connect */
    while (0 == get_connect_state())
        /* Give time to the network task to connect */

    configPRINTF(("TCP server start\r\n"));
    configPRINTF(("MQTT connection start\r\n"));

    mqtt_client = mqtt_client_new();
    if (mqtt_client == NULL)
    	configPRINTF(("mqtt_client_new() failed.\r\n");)
        while (1)
    if (ipaddr_aton(EXAMPLE_MQTT_SERVER_HOST, &mqtt_addr) && IP_IS_V4(&mqtt_addr))
        /* Already an IP address */
        err = ERR_OK;
        /* Resolve MQTT broker's host name to an IP address */
    	configPRINTF(("Resolving \"%s\"...\r\n", EXAMPLE_MQTT_SERVER_HOST));
        err = netconn_gethostbyname(EXAMPLE_MQTT_SERVER_HOST, &mqtt_addr);
        configPRINTF(("Resolving status: %d.\r\n", err));

    if (err == ERR_OK)
    	 configPRINTF(("connect to mqtt\r\n"));
        /* Start connecting to MQTT broker from tcpip_thread */
        err = tcpip_callback(connect_to_mqtt, NULL);
        configPRINTF(("connect status: %d.\r\n", err));
        if (err != ERR_OK)
        	configPRINTF(("Failed to invoke broker connection on the tcpip_thread: %d.\r\n", err));
    	configPRINTF(("Failed to obtain IP address: %d.\r\n", err));

    int i=0;
    /* Publish some messages */
    for (i = 0; i < 5;)
    	configPRINTF(("connect status enter: %d.\r\n", connected));
        if (connected)
            err = tcpip_callback(publish_message_start, NULL);
            if (err != ERR_OK)
            	configPRINTF(("Failed to invoke publishing of a message on the tcpip_thread: %d.\r\n", err));

Please note the following published json data, it can’t be publish directly in the code.


  "reported": {

    "LEDstatus": false,

    "humid": 88,

    "temp": 22



Which need to use this web https://www.bejson.com/ realize the json data compression and convert:

{\"reported\" : {     \"LEDstatus\" : true,     \"humid\" : 88,     \"temp\" : 11    } }


4main appTask

Under case kCommandGeneric: , if the language is Chinese, then add the corresponding voice recognition control code.

"开远程灯": turn on the local yellow light, publish the “remote led on” mqtt message to Baidu cloud, control remote 1060EVK board lights on.

"关远程灯": turn on the local white light, publish the “remote led off” mqtt message to Baidu cloud, control the remote 1060EVK board light off.

Related operation code:

 else if (oob_demo_control.ledCmd == LED_REMOTE_ON)

                         err_t err      = ERR_OK;
                         err = tcpip_callback(publish_message_on, NULL);
                         if (err != ERR_OK)
                         	configPRINTF(("Failed to invoke publishing of a message on the tcpip_thread: %d.\r\n", err));

                     else if (oob_demo_control.ledCmd == LED_REMOTE_OFF)

                         err_t err      = ERR_OK;
                         err = tcpip_callback(publish_message_off, NULL);
                         if (err != ERR_OK)
                            configPRINTF(("Failed to invoke publishing of a message on the tcpip_thread: %d.\r\n", err));

3.2 MIMXRT1060-EVK code

The main function of the MIMXRT1060-EVK code is to configure another client in the cloud, subscribe to the message published by SLN-LOCAL/2-IOT which detect the remote command, and then the LED on the control board is used to test the voice recognition remote control function, this code is based on Ethernet, through the Ethernet port on the board, to achieve network communication, and then use mqtt to connect baidu cloud, and subscribe the message from local2, This enables the reception and execution of the Local2 command.

the network code part is similar to SLN-LOCAL2-IOT board network code, the servers, cloud account passwords, etc. are all the same, the main function is to subscribe messages. See the code from attachment RT1060, lwip_mqtt_freertos.c file.

When receives data published by the server, it needs to do a data analysis to get the status of the led light and then control it.

Normal data from Baidu cloud shadow sent as follows

Received 253 bytes from the topic "$baidu/iot/shadow/RT1060BTCDShadow/update/accepted": "{"requestId":"2fc0ca29-63c0-4200-843f-e279e0f019d3","reported":{"LEDstatus":false,"humid":44,"temp":33},"desired":{},"lastUpdatedTime":{"reported":{"LEDstatus":1635240225296,"humid":1635240225296,"temp":1635240225296},"desired":{}},"profileVersion":159}"

Then you need to parse the data of LEDstatus from the received data, whether it is false or true.

Because the amount of data is small, there is no json-driven parsing here, just pure data parsing, adding the following parsing code to the mqtt_incoming_data_cb function:

mqtt_rec_data.mqttindex = mqtt_rec_data.mqttindex + len;
    if(mqtt_rec_data.mqttindex >= 250)
    	PRINTF("kerry test \r\n");
    	PRINTF("idex= %d", mqtt_rec_data.mqttindex);
        datap = strstr((char*)mqtt_rec_data.mqttrecdata,"LEDstatus");
        if(datap != NULL)
        	if(!strncmp(datap+11,strtrue,4))//char strtrue[]="true";
        		GPIO_PinWrite(GPIO1, 3, 1U); //pull high
        	else if(!strncmp(datap+11,strfalse,5))//char strfalse[]="false";
        		GPIO_PinWrite(GPIO1, 3, 0U); //pull low
    	mqtt_rec_data.mqttindex =0;

It use the strstr search the “LEDstatus“ in the received data, and get the pointer position, then add the fixed length to get the LED status is true or flash. If it is true, turn on the led, if it is false, turn off the led.

4 Test Result

   This section gives the test results and video of the system. Before testing the voice function, first use MQTTfx to test baidu cloud connection, release, subscription is no problem, and then test sln-local2-iot combined with mimxrt1060-evk voice wake-up recognition and remote control functions.

   For SLN-LOCAL2-IOT wifi hotspot join, enter the command in the print terminal:

setup AWS kerry123456


4.1 MQTT.fx test baidu cloud connection

MQTT.fx is an EclipsePaho-based MQTT client tool written in the Java language that supports subscription and publishing of messages through Topic.   

4.1.1 MQTT fx configuration

    Download and install the tool, then open it, at first, need to do the configuration, click edit connection:



Profile name:connect name

Profile type: MQTT broker

Broker address: It is the baidu could generated broker address, with 1883 no encryption transfer.

Broker port:1883 No encryption

Client ID: RT1060BTCDShadow, here need to note, this name should be the same as the could shadow name, otherwise, on the baidu webpage, the connection is not be detected. If this Client ID name is the same as the shadow name, then when the MQTT fx connect, the online side also can see the connection is OK.

User credentials: add the thing User name and password from the baidu cloud.

After the configuration, click connect, and refresh the website.

Before conection:


Pic 20

After connection:


Pic 21

4.1.2 MQTT fx subscribe

When it comes to subscription publishing, what is the topic of publishing subscriptions?

 Here you can open your thing shadow, select the interaction, and see that the page has given the corresponding topic situation:


Pic 22

Subscribe topic is:


Publish topic is:



Pic 23

Click subscribe, we can see it already can used to receive the data.


4.1.3 MQTT fx publish

Publish need to input the topic:


It also need to input the content, it will use the json content data.


Pic 24

Here, we can use this json data:


  "reported" : {

    "LEDstatus" : true,

    "humid" : 88,

    "temp" : 11



The json data also can use the website to check the data:



Pic 25

Input the publish data, and click pubish button:


Pic 26

4.1.4 Publish data test result

  Before publish, clean the website thing data:


Pic 27

MQTT fx publish data, then check the subscribe data and the website situation:


Pic 28

We can see, the published data also can be see in the website and the mqttfx subscribe area. Until now, the connection, data transfer test is OK.


4.2 Voice recognition and remote control test

This is the device connection picture:


Pic 29

4.2.1 voice recognition local control


Pic 30

This is the SLN-LOCAL2-IOT print information after recognize the voice WW and VC.

Red led on:

led cycle:

4.2.2 voice recognition remote control

  Following test, wakeup + remote on, wakeup+remote off, and also give the print result and the video.


Pic 31

remote control:


Labels (1)
Version history
Revision #:
5 of 5
Last update:
‎12-23-2021 12:53 AM
Updated by: