Fast GPU Image Processing in the i.MX 6x

Document created by grantw on Sep 26, 2012Last modified by Jodi Paul on Mar 14, 2013
Version 4Show Document
  • View in full screen mode

Fast GPU Image Processing in the i.MX 6x

by Guillermo Hernandez, Freescale



Color tracking is useful as a base for complex image processing use cases, like determining what parts of an image belong to skin is very important for face detection or hand gesture applications.


In this example we will present a method that is robust enough to take some noise and blur, and different lighting conditions thanks to the use of OpenGL ES 2.0 shaders running in the i.MX 6X  multimedia processor.



This how-to assumes that the reader is an experienced developer and is familiar with the tools and techniques around this technology, also this paper assumes the reader has intermediate graphics knowledge and experience such as the RGBA structure of pictures and video frames and programming OpenGL based applications, as we will not dig in the details of the basic setup.



Within this paper, we will see how to implement a very fast color tracking application that uses the GPU instead of the CPU using OpenGL ES 2.0 shaders.


Step 1: Gather all the components


For this example we will use:

1.      i.MX6q ARD platform

2.      Linux ER5

3.      Oneric rootfs with ER5 release packages

4.      Open CV 2.0.0 source


Step 2: building everything you need


Refer to ER5 User´s Guide and Release notes on how to build and boot the board with the Ubuntu Oneric rootfs. After you are done, you will need to build the Open CV 2.0.0 source in the board, or you could add it to the ltib and have it built for you.


NOTE: We will be using open CV only for convenience purposes, we will not use any if its advanced math or image processing  features (because everything happens on the CPU and that is what we are trying to avoid), but rather to have an easy way of grabbing and managing  frames from the USB camera.


Step 3: Application setup


Make sure that at this point you have a basic OpenGL Es 2.0 application running, a simple plane with a texture mapped to it should be enough to start. (Please refer to Freescale GPU examples).


Step 4: OpenCV auxiliary code


The basic idea of the workflow is as follows:

a)      Get the live feed from the USB camera using openCV function cvCapture() and store into IplImage structure.

b)      Create an OpenGL  texture that reads the IplImage buffer every frame and map it to a plane in OpenGL ES 2.0.

c)      Use the Fragment Shader to perform fast image processing calculations, in this example we will examine the Sobel Filter and Binary Images that are the foundations for many complex Image Processing algorithms.

d)      If necessary, perform multi-pass rendering to chain several image processing shaders  and get an end result.


First we must import our openCV relevant headers:

#include "opencv/cv.h"

#include "opencv/cxcore.h"

#include "opencv/cvaux.h"

#include "opencv/highgui.h"


Then we should define a texture size, for this example we will be using 320x240, but this can be easily changed to 640 x 480

#define TEXTURE_W 320

#define TEXTURE_H 240


We need to create an OpenCV capture device to enable its V4L camera and get the live feed:

CvCapture *capture;

capture = cvCreateCameraCapture (0);

cvSetCaptureProperty (capture, CV_CAP_PROP_FRAME_WIDTH,  TEXTURE_W);

cvSetCaptureProperty (capture, CV_CAP_PROP_FRAME_HEIGHT, TEXTURE_H);


Note: when we are done, remember to close the camera stream:

cvReleaseCapture (&capture);


OpenCV has a very convenient structure used for storing pixel arrays (a.k.a. images) called IplImage

IplImage *bgr_img1;

IplImage *frame1;

bgr_img1 = cvCreateImage (cvSize (TEXTURE_W, TEXTURE_H), 8, 4);


OpenCV has a very convenient function for capturing a frame from the camera and storing it into a IplImage

frame2 = cvQueryFrame(capture2);


Then we will want to separate the camera capture process from the pos-processing filters and final rendering; hence, we should create a thread to exclusively handle the camera:

#include <pthread.h>

pthread_t camera_thread1;

pthread_create (&camera_thread1, NULL, UpdateTextureFromCamera1,(void *)&thread_id);


Your UpdateTextureFromCamera() function should be something like this:

void *UpdateTextureFromCamera2 (void *ptr)




            frame2 = cvQueryFrame(capture);

            //cvFlip (frame2, frame2, 1);  // mirrored image

            cvCvtColor(frame2, bgr_img2, CV_BGR2BGRA);


      return NULL;   



Finally, the rendering loop should be something like this:

while (! window->Kbhit ())



            tt = (double)cvGetTickCount();

            Render ();

            tt = (double)cvGetTickCount() - tt;

            value = tt/(cvGetTickFrequency()*1000.);

            printf( "\ntime = %gms --- %.2lf FPS", value, 1000.0 / value);

            //key = cvWaitKey (30);




Step 5: Map the camera image to a GL Texture


As you can see, you need a Render function call every frame, this white paper will not cover in detail the basic OpenGL  or EGL setup of the application, but we would rather focus on the ES 2.0 shaders.

GLuint _texture;

GLeglImageOES g_imgHandle;

IplImage *_texture_data;


The function to map the texture from our stored pixels in IplImage is quite simple: we just need to get the image data, that is basically a pixel array

void GLCVPlane::PlaneSetTex (IplImage *texture_data)


      cvCvtColor (texture_data, _texture_data, CV_BGR2RGB);

      glBindTexture(GL_TEXTURE_2D, _texture);

      glTexImage2D (GL_TEXTURE_2D, 0, GL_RGB, _texture_w, _texture_h, 0, GL_RGB, GL_UNSIGNED_BYTE, _texture_data->imageData);




This function should be called inside our render loop:

void Render (void)


  glClearColor (0.0f, 0.0f, 0.0f, 0.0f);





At this point the OpenGL texture is ready to be used as a sampler in our Fragment Shader  mapped to a 3D plane



Image Processing.png


Lastly,  when you are ready to draw your plane with the texture in it:

// Set the shader program

glUseProgram (_shader_program);

// Binds this texture handle so we can load the data into it

/* Select Our Texture */


//Select eglImage

glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, g_imgHandle);

glDrawArrays (GL_TRIANGLES, 0, 6);


Step 6: Use the GPU to do Image Processing

First we need to make sure we have the correct Vertex Shader and Fragment shader, we will  focus only in the Fragment Shader, this is where we will process our image from the camera.


Below you will find the most simple fragment shader, this one only colors pixels from the sample texture

const char *planefrag_shader_src =

      "#ifdef GL_FRAGMENT_PRECISION_HIGH                    \n"

      "  precision highp float;                            \n"

      "#else                                          \n"

      "  precision mediump float;                    \n"

      "#endif                                        \n"

      "                                              \n"

      "uniform sampler2D s_texture;                  \n"

      "varying  vec3      g_vVSColor;                      \n"

      "varying  vec2 g_vVSTexCoord;                        \n"

      "                                              \n"

      "void main()                                    \n"

      "{                                              \n"

      "    gl_FragColor = texture2D(s_texture,g_vVSTexCoord);    \n"

      "}                                              \n";


Binary Image


The most Simple Image Filter is the Binary Image, this one converts a source image to a black/white output, to decide if a color should be black or white we need a threshold,  everything below that threshold will be black, and any color above should be white.


Binary Filter.png



The shader code is as follows:

const char* g_strRGBtoBlackWhiteShader =

    #ifdef GL_FRAGMENT_PRECISION_HIGH                     

      precision highp float;                       


      precision mediump float;                     


      varying  vec2 g_vVSTexCoord;           

      uniform sampler2D s_texture;             

      uniform float threshold;                 


      void main() {                           

        vec3 current_Color = texture2D(s_texture,g_vVSTexCoord).xyz;

        float luminance = dot (vec3(0.299,0.587,0.114),current_Color);

        if(luminance>threshold)                      \n"

            gl_FragColor = vec4(1.0);                \n"

          else                                  \n"             

            gl_FragColor = vec4(0.0);                \n"

      }                                        \n";

You can notice that the main operation is to get a luminance value of the pixel, in order to achieve that we have to multiply a known vector (obtained empirically) by the current pixel, then we simply compare that luminance value with a threshold. Anything below that threshold will be black, and anything above that threshold will be considered a white pixel.


SOBEL Operator


Sobel is a very common filter, since it is used as a foundation for many complex Image Processing processes, particularly in edge detection algorithms. The sobel operator is based in convolutions, the convolution is made of a particular mask, often called a kernel (on common therms, usually a 3x3 matrix).


The sobel operator calculates the gradient of the image at each pixel, so it tells us how it changes from the pixels surrounding the current pixel , meaning how it increases or decreases (darker to brighter values).


Sobel Operation.png         


The shader is a bit long, since several operations must be performed, we shall discuss each of its parts below:

First we need to get the texture coordinates from the Vertex Shader:

const char* plane_sobel_filter_shader_src =

#ifdef GL_FRAGMENT_PRECISION_HIGH                   

precision highp float;                         


precision mediump float;                       


varying  vec2 g_vVSTexCoord;                 

uniform sampler2D s_texture;                   


Then we should define our kernel, as stated before, a 3x3 matrix should be enough, and the following values have been tested with good results:

mat3 kernel1 = mat3 (-1.0, -2.0, -1.0,                   

                      0.0, 0.0, 0.0,                       

                      1.0, 2.0, 1.0);   


We also need a convenient way to convert to grayscale, since we only need grayscale information for the Sobel operator, remember that to convert to grayscale you only need an average of the three colors:

float toGrayscale(vec3 source) {                   

float average = (source.x+source.y+source.z)/3.0;       

return average;             


Now we go to the important part, to actually perform the convolutions. Remember that by the OpenGL ES 2.0 spec, nor recursion nor dynamic indexing is supported, so we need to do our operations the hard way: by defining vectors and multiplying them. See the following code:

  float doConvolution(mat3 kernel) {                           

  float sum = 0.0;                                 

  float current_pixelColor = toGrayscale(texture2D(s_texture,g_vVSTexCoord).xyz);

float xOffset = float(1)/1024.0;                   

float yOffset = float(1)/768.0;

float new_pixel00 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x-  xOffset,g_vVSTexCoord.y-yOffset)).xyz);

float new_pixel01 = toGrayscale(texture2D(s_texture,


float new_pixel02 = toGrayscale(texture2D(s_texture,  vec2(g_vVSTexCoord.x+xOffset,g_vVSTexCoord.y-yOffset)).xyz);

vec3 pixelRow0 = vec3(new_pixel00,new_pixel01,new_pixel02);

float new_pixel10 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x-xOffset,g_vVSTexCoord.y)).xyz);\n"

float new_pixel11 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x,g_vVSTexCoord.y)).xyz);

float new_pixel12 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x+xOffset,g_vVSTexCoord.y)).xyz);

vec3 pixelRow1 = vec3(new_pixel10,new_pixel11,new_pixel12);

float new_pixel20 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x-xOffset,g_vVSTexCoord.y+yOffset)).xyz);

float new_pixel21 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x,g_vVSTexCoord.y+yOffset)).xyz);

float new_pixel22 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x+xOffset,g_vVSTexCoord.y+yOffset)).xyz);

vec3 pixelRow2 = vec3(new_pixel20,new_pixel21,new_pixel22);

vec3 mult1 = (kernel[0]*pixelRow0);                 

vec3 mult2 = (kernel[1]*pixelRow1);                 

vec3 mult3 = (kernel[2]*pixelRow2);                 

sum= mult1.x+mult1.y+mult1.z+mult2.x+mult2.y+mult2.z+mult3.x+


    return sum;                               


If you see the last part of our function, you can notice that we are adding the multiplication values to a sum, with this sum we will see the variation of each pixel regarding its neighbors.


The last part of the shader is where we will use all our previous functions, it is worth to notice that the convolution needs to be applied horizontally and vertically for this technique to be complete:


void main() {                                 

  float horizontalSum = 0.0;                         

  float verticalSum = 0.0;                     

  float averageSum = 0.0;                     

  horizontalSum = doConvolution(kernel1);     

  verticalSum = doConvolution(kernel2);       

    if( (verticalSum > 0.2)|| (horizontalSum >0.2)||(verticalSum < -0.2)|| (horizontalSum <-0.2))           

            averageSum = 0.0;                 


            averageSum = 1.0;                 

  gl_FragColor = vec4(averageSum,averageSum,averageSum,1.0);         




Conclusions and future work


At this point, if you have your application up and running, you can notice that Image Processing can be done quite fast, even with images larger than 640 480. This approach can be expanded to a variety of techniques like Tracking, Feature detection and Face detection.


However, these techniques are out of scope for now, because this algorithms need multiple rendering passes (like face detection), where we need to perform an operation, then write the result to an offscreen buffer and use that buffer as an input for the next shader and so on.  But Freescale is planning to release an Application Note in Q4 2012 that will expand this white paper and cover these techniques in detail.