Execute OpenCL Kernel from Gstreamer Plugin

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Execute OpenCL Kernel from Gstreamer Plugin

2,602 Views
_at
Contributor III

Hi All,

Is it possible to execute OpenCL ' Kernel from Gstreamer Plugin during running Gstreamer pipeline?

I want to use OpenCL from gstreamer plugin while running any video through gst-launch.

I am adding one sample Gstreamer Plugin (e.g. sampleocl-plugin) into the Gstreamer Pipeline. From 'sampleocl-plugin' i am creating OpenCL Context, creating two Image2D memory object using clCreateImage2D, creating & loading kernel, creating command queue, setargs for kernel (clSetKernelArg(.....,  &srcimage2D)  , clSetKernelArg(...., &dstimage3D)).

Then executing kernel clEnqueueNDRangeKernel with 2 dimensional NDRangeKernel, global work size set to global[0]=640, global[1]=480. & wait for completion of execution of kernel.

My kernel is simply doing memcpy from SRC image to DST image.

code snippet of kernel is :

////////////// START

__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;

_kernel void copy_image (

__read_only image2d_t input,

__write_only image2d_t output)

{

  const int2 pos = {get_global_id(0), get_global_id(1)};

  float4 sum = read_imagef(input, sampler, pos);

  write_imagef (output, (int2)(pos.x, pos.y), sum);

}

/////////////// END

What i am observed here that, after completion of my kernel execution, i received only one pixel from SRC image to DST image & i.e from (x=0,y=0) only.

All global Work Items  {global[0]=640, global[0]=480 } MUST execute same Kernel & result MUST all pixels copied from SRC to DST.

I have done/ executed exactly same thing from one simple unit test (executable file), & its working fine. I received entire SRC image in DST image.

In gstreamer, only one pixel copied ? Is something wrong ?

Or any other way should i execute OpenCL from gstreamer ?

I am using YOCTO build system.

-Ankit.

Labels (1)
Tags (2)
0 Kudos
6 Replies

1,619 Views
_at
Contributor III

Any response guyz ??

I am attaching zip here. Please build it using YOCTO build system.

Please give needful help.

0 Kudos

1,619 Views
andre_silva
NXP Employee
NXP Employee

Hi Ankit,

I tried your code and got the following result:

root@imx6qsabresd:/home/opencl_app# ./opencl_app                          
platformIdCount: 1                                                        
platformId: 0x76e6c660                                                    
device_count: 1                                                           
deviceId: 0x76e6c4e0                                                      
clCreateContext: 0                                                        
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS  SIZE: 4, iret:0                       
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS VAL: 3                                 
CL_DEVICE_MAX_WORK_ITEM_SIZES  SIZE: 12, iret:0                           
CL_DEVICE_MAX_WORK_ITEM_SIZES VAL: 1024                                   
CL_DEVICE_MAX_WORK_ITEM_SIZES VAL: 1024                                   
CL_DEVICE_MAX_WORK_ITEM_SIZES VAL: 1024                                   
CL_DEVICE_MAX_WORK_GROUP_SIZE  SIZE: 4, iret:0                            
CL_DEVICE_MAX_WORK_GROUP_SIZE VAL: 1024                                   
CL_DEVICE_MAX_COMPUTE_UNITS  SIZE: 4, iret:0                              
CL_DEVICE_MAX_COMPUTE_UNITS VAL: 4                                        
CL_DEVICE_EXTENSIONS  SIZE: 31, iret:0                                    
CL_DEVICE_EXTENSIONS VAL: cl_khr_byte_addressable_store                   
clCreateImage2D: 0                                                        
clCreateImage2D: 0                                                        
clCreateProgramWithSource: 0x6e3073e0                                     
clBuildProgram: 0                                                         
clCreateCommandQueue: 0                                                   
clCreateKernel: 0                                                         
===IN FRAME         :3899000                                          
===IN FRAME2:1487667                                                  
===KERNEL           :10623000                                         
===OUT FRAME:12903000                                                 
Display Result:                                                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb                           

bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb

------------------------------------------

based in this function:

// This code will show only single channel data

void display_img_data(struct img_info *info)

{

    int y, x;

    unsigned int *int_ptr = (unsigned int *)info->data;

    int shift = 24; // Only 1 channel out of ARGB shall be considered

 

    for (y=0; y<16; y++) {

     for (x=0; x<16; x++) {

        int data =  (((*int_ptr)>>(shift))&0xff);

             printf("%x ",data);

        }

        printf("\n");

    }

}

in image_data.c and this one:

void set_default_img_data(struct img_info *info, unsigned int val)

{

    int y, x;

    unsigned int fixval = 0x04040404;

    unsigned int fixincr = 0x01010101;

    unsigned int *int_ptr = (unsigned int *)info->data;

    for (y=0; y<info->height; y++)

     for (x=0; x<info->width; x++) {

            *int_ptr = val;  //(fixval + fixincr *x);

            int_ptr++;

        }

}

which you pass:

set_default_img_data(&param.src, 0xbbbbbbbb);

looks it is being correctly executed, right ?

let me know if I misunderstood it.

Cheers,

andre

0 Kudos

1,619 Views
_at
Contributor III

Hi Andre,

This unit-test is working fine. But actual question is: when i integrate its code under gstreamer plug-in it is not working there.

Those functions (set_default_img_data & display_img_data) are for just verification whether its working or not in unit-test.

Actual behavior in gstreamer plug-in is what i mentioned in my above query ( i received only one pixel from SRC image to DST image  i.e from (x=0,y=0) only).

In Gstreamer, What i have next experimented is that, Instead of using Image objects (e.g. 'clCreateImage2D' ), now i m using Buffer objects (e.g 'clCreateBuffer') for memory object creation.

Wrote new kernel api shown below. Executed that api & its working fine

/////////////////// Kernel code in imgcopy.cl ///

__kernel void copy_buffer ( __global uchar *input,

  __global uchar *output,

  int width,

  int height)

{

  int x = get_global_id (0);

  int y = get_global_id (1);

  if(1) {

  int id = (y * width * 4) + (x * 4);

  unsigned int *in = &input[id];

  unsigned int *out = &output[id];

  *out = (unsigned int)*in;

  } else {

  int id = (y * width * 4) + (x * 4);

  //output[id] = input[id];

  output[id] = input[id]; id++;

  output[id] = input[id]; id++;

  output[id] = input[id]; id++;

  output[id] = input[id]; id++;

  }

}

/////////////////////////////////////////////////////////

So, whether i m doing something wrong in creation of memory object ?

-Ankit.

0 Kudos

1,619 Views
andre_silva
NXP Employee
NXP Employee

Hi Ankit, I will check if there is any issue related and let you know. But I would recommend  you stick with buffers instead, ,this is something I took from the Vivante documentation:

"Note that for improved performance with Vivante Embedded Profile cores, buffers should be used instead of read_image and write_image."

regards,

Andre

0 Kudos

1,619 Views
_at
Contributor III

Hi Andre,

Thank you for your respond.

Now, in terms of Performance; When i execute copy_buffer (from above kernel code, for 800x480 width and height), it take approx 11.5 ms.

So, what will be approximate value (overhead for OpenCL in i.MX6Q ) OpenCL should take  to Read from input memory and Write to output memory?


-Ankit

0 Kudos

1,619 Views
andre_silva
NXP Employee
NXP Employee

Hi Ankit, sorry for the late response, I will check that info and let you know.

0 Kudos