Execute OpenCL Kernel from Gstreamer Plugin

_at · ‎11-07-2014

Hi All,

Is it possible to execute OpenCL ' Kernel from Gstreamer Plugin during running Gstreamer pipeline?

I want to use OpenCL from gstreamer plugin while running any video through gst-launch.

I am adding one sample Gstreamer Plugin (e.g. sampleocl-plugin) into the Gstreamer Pipeline. From 'sampleocl-plugin' i am creating OpenCL Context, creating two Image2D memory object using clCreateImage2D, creating & loading kernel, creating command queue, setargs for kernel (clSetKernelArg(....., &srcimage2D) , clSetKernelArg(...., &dstimage3D)).

Then executing kernel clEnqueueNDRangeKernel with 2 dimensional NDRangeKernel, global work size set to global[0]=640, global[1]=480. & wait for completion of execution of kernel.

My kernel is simply doing memcpy from SRC image to DST image.

code snippet of kernel is :

////////////// START

__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;

_kernel void copy_image (

__read_only image2d_t input,

__write_only image2d_t output)

{

const int2 pos = {get_global_id(0), get_global_id(1)};

float4 sum = read_imagef(input, sampler, pos);

write_imagef (output, (int2)(pos.x, pos.y), sum);

}

/////////////// END

What i am observed here that, after completion of my kernel execution, i received only one pixel from SRC image to DST image & i.e from (x=0,y=0) only.

All global Work Items {global[0]=640, global[0]=480 } MUST execute same Kernel & result MUST all pixels copied from SRC to DST.

I have done/ executed exactly same thing from one simple unit test (executable file), & its working fine. I received entire SRC image in DST image.

In gstreamer, only one pixel copied ? Is something wrong ?

Or any other way should i execute OpenCL from gstreamer ?

I am using YOCTO build system.

-Ankit.

_at · ‎02-24-2015

Any response guyz ??

I am attaching zip here. Please build it using YOCTO build system.

Please give needful help.

andre_silva · ‎03-02-2015

Hi Ankit,

I tried your code and got the following result:

root@imx6qsabresd:/home/opencl_app# ./opencl_app
platformIdCount: 1
platformId: 0x76e6c660
device_count: 1
deviceId: 0x76e6c4e0
clCreateContext: 0
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS SIZE: 4, iret:0
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS VAL: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES SIZE: 12, iret:0
CL_DEVICE_MAX_WORK_ITEM_SIZES VAL: 1024
CL_DEVICE_MAX_WORK_ITEM_SIZES VAL: 1024
CL_DEVICE_MAX_WORK_ITEM_SIZES VAL: 1024
CL_DEVICE_MAX_WORK_GROUP_SIZE SIZE: 4, iret:0
CL_DEVICE_MAX_WORK_GROUP_SIZE VAL: 1024
CL_DEVICE_MAX_COMPUTE_UNITS SIZE: 4, iret:0
CL_DEVICE_MAX_COMPUTE_UNITS VAL: 4
CL_DEVICE_EXTENSIONS SIZE: 31, iret:0
CL_DEVICE_EXTENSIONS VAL: cl_khr_byte_addressable_store
clCreateImage2D: 0
clCreateImage2D: 0
clCreateProgramWithSource: 0x6e3073e0
clBuildProgram: 0
clCreateCommandQueue: 0
clCreateKernel: 0
===IN FRAME	:3899000
===IN FRAME2	:1487667
===KERNEL	:10623000
===OUT FRAME	:12903000
Display Result:
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb

bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb

------------------------------------------

based in this function:

// This code will show only single channel data

void display_img_data(struct img_info *info)

{

int y, x;

unsigned int *int_ptr = (unsigned int *)info->data;

int shift = 24; // Only 1 channel out of ARGB shall be considered

for (y=0; y<16; y++) {

for (x=0; x<16; x++) {

int data = (((*int_ptr)>>(shift))&0xff);

printf("%x ",data);

}

printf("\n");

}

in image_data.c and this one:

void set_default_img_data(struct img_info *info, unsigned int val)

{

int y, x;

unsigned int fixval = 0x04040404;

unsigned int fixincr = 0x01010101;

unsigned int *int_ptr = (unsigned int *)info->data;

for (y=0; y<info->height; y++)

for (x=0; x<info->width; x++) {

*int_ptr = val; //(fixval + fixincr *x);

int_ptr++;

}

which you pass:

set_default_img_data(&param.src, 0xbbbbbbbb);

looks it is being correctly executed, right ?

let me know if I misunderstood it.

Cheers,

andre

_at · ‎03-03-2015

Hi Andre,

This unit-test is working fine. But actual question is: when i integrate its code under gstreamer plug-in it is not working there.

Those functions (set_default_img_data & display_img_data) are for just verification whether its working or not in unit-test.

Actual behavior in gstreamer plug-in is what i mentioned in my above query ( i received only one pixel from SRC image to DST image i.e from (x=0,y=0) only).

In Gstreamer, What i have next experimented is that, Instead of using Image objects (e.g. 'clCreateImage2D' ), now i m using Buffer objects (e.g 'clCreateBuffer') for memory object creation.

Wrote new kernel api shown below. Executed that api & its working fine

/////////////////// Kernel code in imgcopy.cl ///

__kernel void copy_buffer ( __global uchar *input,

__global uchar *output,

int width,

int height)

{

int x = get_global_id (0);

int y = get_global_id (1);

if(1) {

int id = (y * width * 4) + (x * 4);

unsigned int *in = &input[id];

unsigned int *out = &output[id];

*out = (unsigned int)*in;

} else {

int id = (y * width * 4) + (x * 4);

//output[id] = input[id];

output[id] = input[id]; id++;

}

/////////////////////////////////////////////////////////

So, whether i m doing something wrong in creation of memory object ?

-Ankit.

andre_silva · ‎03-03-2015

Hi Ankit, I will check if there is any issue related and let you know. But I would recommend you stick with buffers instead, ,this is something I took from the Vivante documentation:

"Note that for improved performance with Vivante Embedded Profile cores, buffers should be used instead of read_image and write_image."

regards,

Andre

_at · ‎03-03-2015

Hi Andre,

Thank you for your respond.

Now, in terms of Performance; When i execute copy_buffer (from above kernel code, for 800x480 width and height), it take approx 11.5 ms.

So, what will be approximate value (overhead for OpenCL in i.MX6Q ) OpenCL should take to Read from input memory and Write to output memory?

-Ankit

andre_silva · ‎03-17-2015

Hi Ankit, sorry for the late response, I will check that info and let you know.

Execute OpenCL Kernel from Gstreamer Plugin

Execute OpenCL Kernel from Gstreamer Plugin

i.MX6Quad