Fast GPU Image Processing in the i.MX 6x

1 件の賞賛

by Guillermo Hernandez, Freescale

Introduction

Color tracking is useful as a base for complex image processing use cases, like determining what parts of an image belong to skin is very important for face detection or hand gesture applications.

In this example we will present a method that is robust enough to take some noise and blur, and different lighting conditions thanks to the use of OpenGL ES 2.0 shaders running in the i.MX 6X multimedia processor.

Prerequisites

This how-to assumes that the reader is an experienced i.mx developer and is familiar with the tools and techniques around this technology, also this paper assumes the reader has intermediate graphics knowledge and experience such as the RGBA structure of pictures and video frames and programming OpenGL based applications, as we will not dig in the details of the basic setup.

Scope

Within this paper, we will see how to implement a very fast color tracking application that uses the GPU instead of the CPU using OpenGL ES 2.0 shaders.

Step 1: Gather all the components

For this example we will use:

1. i.MX6q ARD platform

2. Linux ER5

3. Oneric rootfs with ER5 release packages

4. Open CV 2.0.0 source

Step 2: building everything you need

Refer to ER5 User´s Guide and Release notes on how to build and boot the board with the Ubuntu Oneric rootfs. After you are done, you will need to build the Open CV 2.0.0 source in the board, or you could add it to the ltib and have it built for you.

NOTE: We will be using open CV only for convenience purposes, we will not use any if its advanced math or image processing features (because everything happens on the CPU and that is what we are trying to avoid), but rather to have an easy way of grabbing and managing frames from the USB camera.

Step 3: Application setup

Make sure that at this point you have a basic OpenGL Es 2.0 application running, a simple plane with a texture mapped to it should be enough to start. (Please refer to Freescale GPU examples).

Step 4: OpenCV auxiliary code

The basic idea of the workflow is as follows:

a) Get the live feed from the USB camera using openCV function cvCapture() and store into IplImage structure.

b) Create an OpenGL texture that reads the IplImage buffer every frame and map it to a plane in OpenGL ES 2.0.

c) Use the Fragment Shader to perform fast image processing calculations, in this example we will examine the Sobel Filter and Binary Images that are the foundations for many complex Image Processing algorithms.

d) If necessary, perform multi-pass rendering to chain several image processing shaders and get an end result.

First we must import our openCV relevant headers:

#include "opencv/cv.h"

#include "opencv/cxcore.h"

#include "opencv/cvaux.h"

#include "opencv/highgui.h"

Then we should define a texture size, for this example we will be using 320x240, but this can be easily changed to 640 x 480

#define TEXTURE_W 320

#define TEXTURE_H 240

We need to create an OpenCV capture device to enable its V4L camera and get the live feed:

CvCapture *capture;

capture = cvCreateCameraCapture (0);

cvSetCaptureProperty (capture, CV_CAP_PROP_FRAME_WIDTH, TEXTURE_W);

cvSetCaptureProperty (capture, CV_CAP_PROP_FRAME_HEIGHT, TEXTURE_H);

Note: when we are done, remember to close the camera stream:

cvReleaseCapture (&capture);

OpenCV has a very convenient structure used for storing pixel arrays (a.k.a. images) called IplImage

IplImage *bgr_img1;

IplImage *frame1;

bgr_img1 = cvCreateImage (cvSize (TEXTURE_W, TEXTURE_H), 8, 4);

OpenCV has a very convenient function for capturing a frame from the camera and storing it into a IplImage

frame2 = cvQueryFrame(capture2);

Then we will want to separate the camera capture process from the pos-processing filters and final rendering; hence, we should create a thread to exclusively handle the camera:

#include <pthread.h>

pthread_t camera_thread1;

pthread_create (&camera_thread1, NULL, UpdateTextureFromCamera1,(void *)&thread_id);

Your UpdateTextureFromCamera() function should be something like this:

void *UpdateTextureFromCamera2 (void *ptr)

{

while(1)

{

frame2 = cvQueryFrame(capture);

//cvFlip (frame2, frame2, 1); // mirrored image

cvCvtColor(frame2, bgr_img2, CV_BGR2BGRA);

}

return NULL;

}

Finally, the rendering loop should be something like this:

while (! window->Kbhit ())

{

tt = (double)cvGetTickCount();

Render ();

tt = (double)cvGetTickCount() - tt;

value = tt/(cvGetTickFrequency()*1000.);

printf( "\ntime = %gms --- %.2lf FPS", value, 1000.0 / value);

//key = cvWaitKey (30);

}

Step 5: Map the camera image to a GL Texture

As you can see, you need a Render function call every frame, this white paper will not cover in detail the basic OpenGL or EGL setup of the application, but we would rather focus on the ES 2.0 shaders.

GLuint _texture;

GLeglImageOES g_imgHandle;

IplImage *_texture_data;

The function to map the texture from our stored pixels in IplImage is quite simple: we just need to get the image data, that is basically a pixel array

void GLCVPlane::PlaneSetTex (IplImage *texture_data)

{

cvCvtColor (texture_data, _texture_data, CV_BGR2RGB);

glBindTexture(GL_TEXTURE_2D, _texture);

glTexImage2D (GL_TEXTURE_2D, 0, GL_RGB, _texture_w, _texture_h, 0, GL_RGB, GL_UNSIGNED_BYTE, _texture_data->imageData);

}

This function should be called inside our render loop:

void Render (void)

{

glClearColor (0.0f, 0.0f, 0.0f, 0.0f);

glClear (GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

PlaneSetTex(bgr_img1);

}

At this point the OpenGL texture is ready to be used as a sampler in our Fragment Shader mapped to a 3D plane

Lastly, when you are ready to draw your plane with the texture in it:

// Set the shader program

glUseProgram (_shader_program);

…

// Binds this texture handle so we can load the data into it

/* Select Our Texture */

glActiveTexture(GL_TEXTURE0);

//Select eglImage

glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, g_imgHandle);

glDrawArrays (GL_TRIANGLES, 0, 6);

Step 6: Use the GPU to do Image Processing

First we need to make sure we have the correct Vertex Shader and Fragment shader, we will focus only in the Fragment Shader, this is where we will process our image from the camera.

Below you will find the most simple fragment shader, this one only colors pixels from the sample texture

const char *planefrag_shader_src =

"#ifdef GL_FRAGMENT_PRECISION_HIGH \n"

" precision highp float; \n"

"#else \n"

" precision mediump float; \n"

"#endif \n"

" \n"

"uniform sampler2D s_texture; \n"

"varying vec3 g_vVSColor; \n"

"varying vec2 g_vVSTexCoord; \n"

" \n"

"void main() \n"

"{ \n"

" gl_FragColor = texture2D(s_texture,g_vVSTexCoord); \n"

"} \n";

Binary Image

The most Simple Image Filter is the Binary Image, this one converts a source image to a black/white output, to decide if a color should be black or white we need a threshold, everything below that threshold will be black, and any color above should be white.

The shader code is as follows:

const char* g_strRGBtoBlackWhiteShader =

#ifdef GL_FRAGMENT_PRECISION_HIGH

precision highp float;

#else

precision mediump float;

#endif

varying vec2 g_vVSTexCoord;

uniform sampler2D s_texture;

uniform float threshold;

void main() {

vec3 current_Color = texture2D(s_texture,g_vVSTexCoord).xyz;

float luminance = dot (vec3(0.299,0.587,0.114),current_Color);

if(luminance>threshold) \n"

gl_FragColor = vec4(1.0); \n"

else \n"

gl_FragColor = vec4(0.0); \n"

} \n";

You can notice that the main operation is to get a luminance value of the pixel, in order to achieve that we have to multiply a known vector (obtained empirically) by the current pixel, then we simply compare that luminance value with a threshold. Anything below that threshold will be black, and anything above that threshold will be considered a white pixel.

SOBEL Operator

Sobel is a very common filter, since it is used as a foundation for many complex Image Processing processes, particularly in edge detection algorithms. The sobel operator is based in convolutions, the convolution is made of a particular mask, often called a kernel (on common therms, usually a 3x3 matrix).

The sobel operator calculates the gradient of the image at each pixel, so it tells us how it changes from the pixels surrounding the current pixel , meaning how it increases or decreases (darker to brighter values).

The shader is a bit long, since several operations must be performed, we shall discuss each of its parts below:

First we need to get the texture coordinates from the Vertex Shader:

const char* plane_sobel_filter_shader_src =

#ifdef GL_FRAGMENT_PRECISION_HIGH

precision highp float;

#else

precision mediump float;

#endif

varying vec2 g_vVSTexCoord;

uniform sampler2D s_texture;

Then we should define our kernel, as stated before, a 3x3 matrix should be enough, and the following values have been tested with good results:

mat3 kernel1 = mat3 (-1.0, -2.0, -1.0,

0.0, 0.0, 0.0,

1.0, 2.0, 1.0);

We also need a convenient way to convert to grayscale, since we only need grayscale information for the Sobel operator, remember that to convert to grayscale you only need an average of the three colors:

float toGrayscale(vec3 source) {

float average = (source.x+source.y+source.z)/3.0;

return average;

}

Now we go to the important part, to actually perform the convolutions. Remember that by the OpenGL ES 2.0 spec, nor recursion nor dynamic indexing is supported, so we need to do our operations the hard way: by defining vectors and multiplying them. See the following code:

float doConvolution(mat3 kernel) {

float sum = 0.0;

float current_pixelColor = toGrayscale(texture2D(s_texture,g_vVSTexCoord).xyz);

float xOffset = float(1)/1024.0;

float yOffset = float(1)/768.0;

float new_pixel00 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x- xOffset,g_vVSTexCoord.y-yOffset)).xyz);

float new_pixel01 = toGrayscale(texture2D(s_texture,

vec2(g_vVSTexCoord.x,g_vVSTexCoord.y-yOffset)).xyz);

float new_pixel02 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x+xOffset,g_vVSTexCoord.y-yOffset)).xyz);

vec3 pixelRow0 = vec3(new_pixel00,new_pixel01,new_pixel02);

float new_pixel10 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x-xOffset,g_vVSTexCoord.y)).xyz);\n"

float new_pixel11 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x,g_vVSTexCoord.y)).xyz);

float new_pixel12 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x+xOffset,g_vVSTexCoord.y)).xyz);

vec3 pixelRow1 = vec3(new_pixel10,new_pixel11,new_pixel12);

float new_pixel20 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x-xOffset,g_vVSTexCoord.y+yOffset)).xyz);

float new_pixel21 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x,g_vVSTexCoord.y+yOffset)).xyz);

float new_pixel22 = toGrayscale(texture2D(s_texture, vec2(g_vVSTexCoord.x+xOffset,g_vVSTexCoord.y+yOffset)).xyz);

vec3 pixelRow2 = vec3(new_pixel20,new_pixel21,new_pixel22);

vec3 mult1 = (kernel[0]*pixelRow0);

vec3 mult2 = (kernel[1]*pixelRow1);

vec3 mult3 = (kernel[2]*pixelRow2);

sum= mult1.x+mult1.y+mult1.z+mult2.x+mult2.y+mult2.z+mult3.x+

mult3.y+mult3.z;\n"

return sum;

}

If you see the last part of our function, you can notice that we are adding the multiplication values to a sum, with this sum we will see the variation of each pixel regarding its neighbors.

The last part of the shader is where we will use all our previous functions, it is worth to notice that the convolution needs to be applied horizontally and vertically for this technique to be complete:

void main() {

float horizontalSum = 0.0;

float verticalSum = 0.0;

float averageSum = 0.0;

horizontalSum = doConvolution(kernel1);

verticalSum = doConvolution(kernel2);

if( (verticalSum > 0.2)|| (horizontalSum >0.2)||(verticalSum < -0.2)|| (horizontalSum <-0.2))

averageSum = 0.0;

else

averageSum = 1.0;

gl_FragColor = vec4(averageSum,averageSum,averageSum,1.0);

}

Conclusions and future work

At this point, if you have your application up and running, you can notice that Image Processing can be done quite fast, even with images larger than 640 480. This approach can be expanded to a variety of techniques like Tracking, Feature detection and Face detection.

However, these techniques are out of scope for now, because this algorithms need multiple rendering passes (like face detection), where we need to perform an operation, then write the result to an offscreen buffer and use that buffer as an input for the next shader and so on. But Freescale is planning to release an Application Note in Q4 2012 that will expand this white paper and cover these techniques in detail.

JeremyStashluk · ‎10-11-2012

Investigate using GStreamer (v4l2src or mfw_v4lsrc) instead of OpenCV for the image capture. This would prevent you from having to do the Oneiric rootfs step.

See this post to improve the performance of step 5. The glTexImage2D method is rather slow.

Computer Vision on i.MX Processors: Video to Texture Streaming (Part 3) - i.MX6 processor

leima · ‎03-25-2017

Hi, jodipaul‌

Would you like to provide the complete sample code?

Best regards.

Fast GPU Image Processing in the i.MX 6x

Fast GPU Image Processing in the i.MX 6x

Fast GPU Image Processing in the i.MX 6x

Graphics & Display

i.MX6_All

Linux