OpenCL Hello World

12 Kudos

Abstract

This is a small tutorial about running a simple OpenCL application in

an i.MX6Q. It covers a very small introduction to OpenCL, the explanation

of the code and how to compile and run it.

Requirements

Any i.MX6Q board.

Linux BSP with the gpu-viv-bin-mx6q package (for instructions on how to build the BSP, check the BSP Users Guide)

OpenCL overview

OpenCL allows any program to use the GPGPU features of the GC2000 (General-Purpose Computing on Graphics Processing Units) that means to use the i.MX6Q GPU processing power in any program.

OpenCL uses kernels which are functions that can be executed in the GPU. These functions must be written in a C99 like code. In our current GPU there

is no scheduling so each kernel will execute in a FIFO fashion. iMx6Q GPU is OpenCL 1.1 EP conformant.

The Code

The example provided here performs a simple addition of arrays in the GPU. The header needed to use openCL is cl.h and is under /usr/include/CL in your BSP

rootfs when you install the gpu-viv-bin-mx6q package. The header is typically included like this: #include <CL/cl.h> The libraries needed to link the program are libGAL.so and libOpenCL.so those are under /usr/lib in your BSP rootfs.

For details on the OpenCL API check the khronos page: http://www.khronos.org/opencl/

Our kernel source is as follows:

__kernel void VectorAdd(__global int* c, __global int* a,__global int* b)

{

// Index of the elements to add

unsigned int n = get_global_id(0);

// Sum the nth element of vectors a and b and store in c

c[n] = a[n] + b[n];

}

The kernel is declared with the signature

__kernel void VectorAdd(__global int* c, __global int* a,__global int* b).

This takes vectors a and b as arguments adds them and stores the result in

the vector c. It looks like a normal C99 method except for the keywords kernel

and global. kernel tells the compiler this function is a kernel, global tells the

compiler this attributes are of global address space.

get_global_id built-in function

This function will tell us to which index of the vector this kernel corresponds

to. And in the last line the vectors are added. Below is the full source code

commented.

//************************************************************

// Demo OpenCL application to compute a simple vector addition

// computation between 2 arrays on the GPU

// ************************************************************

#include <stdio.h>

#include <stdlib.h>

#include <CL/cl.h>

//

// OpenCL source code

const char* OpenCLSource[] = {

"__kernel void VectorAdd(__global int* c, __global int* a,__global int* b)",

"{",

" // Index of the elements to add \n",

" unsigned int n = get_global_id(0);",

" // Sum the nth element of vectors a and b and store in c \n",

" c[n] = a[n] + b[n];",

"}"

};

// Some interesting data for the vectors

int InitialData1[20] = {37,50,54,50,56,0,43,43,74,71,32,36,16,43,56,100,50,25,15,17};

int InitialData2[20] = {35,51,54,58,55,32,36,69,27,39,35,40,16,44,55,14,58,75,18,15};

// Number of elements in the vectors to be added

#define SIZE 100

// Main function

// ************************************************************

int main(int argc, char **argv)

{

// Two integer source vectors in Host memory

int HostVector1[SIZE], HostVector2[SIZE];

//Output Vector

int HostOutputVector[SIZE];

// Initialize with some interesting repeating data

for(int c = 0; c < SIZE; c++)

{

HostVector1[c] = InitialData1[c%20];

HostVector2[c] = InitialData2[c%20];

HostOutputVector[c] = 0;

}

//Get an OpenCL platform

cl_platform_id cpPlatform;

clGetPlatformIDs(1, &cpPlatform, NULL);

// Get a GPU device

cl_device_id cdDevice;

clGetDeviceIDs(cpPlatform, CL_DEVICE_TYPE_GPU, 1, &cdDevice, NULL);

char cBuffer[1024];

clGetDeviceInfo(cdDevice, CL_DEVICE_NAME, sizeof(cBuffer), &cBuffer, NULL);

printf("CL_DEVICE_NAME: %s\n", cBuffer);

clGetDeviceInfo(cdDevice, CL_DRIVER_VERSION, sizeof(cBuffer), &cBuffer, NULL);

printf("CL_DRIVER_VERSION: %s\n\n", cBuffer);

// Create a context to run OpenCL enabled GPU

cl_context GPUContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);

// Create a command-queue on the GPU device

cl_command_queue cqCommandQueue = clCreateCommandQueue(GPUContext, cdDevice, 0, NULL);

// Allocate GPU memory for source vectors AND initialize from CPU memory

cl_mem GPUVector1 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY |

CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector1, NULL);

cl_mem GPUVector2 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY |

CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector2, NULL);

// Allocate output memory on GPU

cl_mem GPUOutputVector = clCreateBuffer(GPUContext, CL_MEM_WRITE_ONLY,

sizeof(int) * SIZE, NULL, NULL);

// Create OpenCL program with source code

cl_program OpenCLProgram = clCreateProgramWithSource(GPUContext, 7, OpenCLSource, NULL, NULL);

// Build the program (OpenCL JIT compilation)

clBuildProgram(OpenCLProgram, 0, NULL, NULL, NULL, NULL);

// Create a handle to the compiled OpenCL function (Kernel)

cl_kernel OpenCLVectorAdd = clCreateKernel(OpenCLProgram, "VectorAdd", NULL);

// In the next step we associate the GPU memory with the Kernel arguments

clSetKernelArg(OpenCLVectorAdd, 0, sizeof(cl_mem), (void*)&GPUOutputVector);

clSetKernelArg(OpenCLVectorAdd, 1, sizeof(cl_mem), (void*)&GPUVector1);

clSetKernelArg(OpenCLVectorAdd, 2, sizeof(cl_mem), (void*)&GPUVector2);

// Launch the Kernel on the GPU

// This kernel only uses global data

size_t WorkSize[1] = {SIZE}; // one dimensional Range

clEnqueueNDRangeKernel(cqCommandQueue, OpenCLVectorAdd, 1, NULL,

WorkSize, NULL, 0, NULL, NULL);

// Copy the output in GPU memory back to CPU memory

clEnqueueReadBuffer(cqCommandQueue, GPUOutputVector, CL_TRUE, 0,

SIZE * sizeof(int), HostOutputVector, 0, NULL, NULL);

// Cleanup

clReleaseKernel(OpenCLVectorAdd);

clReleaseProgram(OpenCLProgram);

clReleaseCommandQueue(cqCommandQueue);

clReleaseContext(GPUContext);

clReleaseMemObject(GPUVector1);

clReleaseMemObject(GPUVector2);

clReleaseMemObject(GPUOutputVector);

for( int i =0 ; i < SIZE; i++)

printf("[%d + %d = %d]\n",HostVector1[i], HostVector2[i], HostOutputVector[i]);

return 0;

}

How to compile in Host

Get to your ltib folder and run

$./ltib m shell

This way you will be using the cross compiler ltib uses and the default include and lib directories will be the ones in your bsp. Then run

LTIB> gcc cl_sample.c -lGAL -lOpenCL -o cl_sample.

How to run in the i.MX6Q

Insert the GPU module

root@freescale/home/user $ modprobe galcore

Copy the compiled CL program and then run

root@freescale /home/user$ ./cl_sample

References

[1] ttp://www.khronos.org/opencl/

Original Attachment has been moved to: libOpenCL.so.zip

Original Attachment has been moved to: libCLC_Android.so.zip

Original Attachment has been moved to: libOpenCL_Android.so.zip

Original Attachment has been moved to: libCLC.so.zip

andre_silva · ‎01-25-2018

I am going to check for more information about CL on Android and let you know soon as possible.

Regards,

Andre

1019599657 · ‎09-25-2019

Hi,spark zh:

are you solve this error? I meet this error on imx8 too, can you give me some advices?

1019599657 · ‎09-25-2019

Hi ,sateeshpedagadi,

Have you solved this error? I meet this erro on imx8,Can you give me some advices?

OpenCL Hello World

OpenCL Hello World

OpenCL Hello World

Graphics & Display

i.MX6_All