Design an IoT edge node for CV application base on the i.MXRT1050

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Design an IoT edge node for CV application base on the i.MXRT1050

Design an IoT edge node for CV application base on the i.MXRT1050

Overview of i.MX RT1050

        The i.MX RT1050 is the industry's first crossover processor and combines the high-performance and high level of integration on an applications processors with the ease of use and real-time functionality of a micro-controller. The i.MX RT1050 runs on the Arm Cortex-M7 core at 600 MHz, it means that it definitely has the ability to do some complicated computing, such as floating-point arithmetic, matrix operation, etc. For general MCU, they're hard to conquer these complicated operations.

        It has a rich peripheral which makes it suit for a variety of applications, in this demo, the PXP (Pixel Pipeline), CSI (CMOS Sensor Interface), eLCDIF (Enhanced LCD Interface) allows me to build up camera display system easily

pastedImage_2.png

Fig 1 i.MX RT series

          It has a rich peripheral which makes it suit for a variety of applications, in this demo, the PXP (Pixel Pipeline), CSI (CMOS Sensor Interface), eLCDIF (Enhanced LCD Interface) allows me to build up camera display system easily

pastedImage_8.png

Fig 2 i.MX RT1050 Block Diagram

Basic concept of Compute Vision (CV)

         Machine Learning (ML) is moving to the edge because of a variety of reasons, such as bandwidth constraint, latency, reliability, security, ect. People want to have edge computing capability on embedded devices to provide more advanced services, like voice recognition for smart speakers and face detection for surveillance cameras.

pastedImage_2.jpg

Fig 3 Reason

       Convolutional Neural Networks (CNNs) is one of the main ways to do image recognition and image classification. CNNs use a variation of multilayer perception that requires minimal pre-processing, based on their shared-weights architecture and translation invariance characteristics.

pastedImage_15.png

Fig 4 Structure of a typical deep neural network

        Above is an example that shows the original image input on the left-hand side and how it progresses through each layer to calculate the probability on the right-hand side.

Hardware

  • MIMXRT1050 EVK Board;
  • RK043FN02H-CT(LCD Panel)

pastedImage_1.png

Fig 5 MIMXRT1050 EVK board

Reference demo code

  • emwin_temperature_control: demonstrates graphical widgets of the emWin library.
  • cmsis_nn_cifar10: demonstrates a convolutional neural network (CNN) example with the use of convolution, ReLU activation, pooling and fully-connected functions from the CMSIS-NN software library. The CNN used in this example is based on the CIFAR-10 example from Caffe. The neural network consists of 3 convolution layers interspersed by ReLU activation and max-pooling layers, followed by a fully-connected layer at the end. The input to the network is a 32x32 pixel color image, which is classified into one of the 10 output classes.

Note: Both of these two demo projects are from the SDK library

Deploy the neuro network mode

Fig 6 illustrates the steps of deploying the neuro network mode on the embedded platform. In the cmsis_nn_cifar10 demo project, it has provided the quantized parameters for the 3 convolution layer, so in this implementation, I use these parameters directly, BTW, I choose 100 images randomly from the Test set as a round of input to evaluate the accuracy of this model. And through several rounds of testing, I get the model's accuracy is about 65% as the below figure shows.

pastedImage_6.png

Fig 6 Deploy the neuro network mode

pastedImage_1.png

Fig 7 cmsis_nn_cifar10 demo project test result

The CIFAR-10 dataset is a collection of images that are commonly used to train ML and computer vision algorithms, it consists of 60000 32x32 color images in 10 classes, with 6000 images per class ("airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"). There are 50000 training images and 10000 test images.

Embedded platform software structure

        After POR, various components are initialized, like system clock, pin mux, camera, CSI, PXP, LCD and emWin, etc. Then control GUI will show up in the LCD, press the Play button will display the camera video in the LCD, once an object into the camera's window, you can press the Capture button to pause the display and run the model to identify the object. Fig8 presents the software structure of this demo.

pastedImage_5.png

Fig 8 Embedded platform software structure

Object identify Test

The three figures present the testing result.

 pastedImage_2.jpg

Fig 9

pastedImage_4.jpg

Fig 10

pastedImage_6.jpg

Fig 11

Furture work

         Use the Pytorch framework to train a better and more complicated convolutional network for object recognition usage.

Labels (2)
Tags (2)
Attachments
No ratings
Version history
Last update:
‎11-04-2019 01:25 AM
Updated by: