OpenCV 3.0 T-API (OpenCL) CL_Error

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

OpenCV 3.0 T-API (OpenCL) CL_Error

2,218 Views
guyorfreeman
Contributor I
hi,
I am running a Yocto Disto on IMX8 SOM with OpenCV 3.0 support.
The Goal of this pilot is running the OpenCV function goodFeaturesToTrack() with OpenCL (using the Vivante on the IMX8).
The OpenCL code works well on the platform, but when I tried running the OpenCV goodFeaturesToTrack() function I get the OpenCL runtime error of CL_INVALID_WORK_GROUP_SIZE (-54) 
At first glance, I would expect the OpenCV 3.0 to query the hardware and set the OpenCL environment accordingly or advise the programmer how to control the workgroup parameters. 
From googling around I understand the nature of the error but I didn't find any OpenCV 3.0 support of how to configure the OpenCL kernel or OpenCL environment being used by OpenCV.
Please Help
Tags (1)
0 Kudos
5 Replies

1,197 Views
fenil
Contributor I

Hello,
I'm trying the same goodfeaturestoTrack Opencv 4.5 function using Opencl in Vivante 7000 GPU on IMX8MPlus, and I'm facing the different error,
CL_LINK_PROGRAM_FAILURE. Is this a similar problem, or does this require something else to be done.

0 Kudos

1,593 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hi Guy,

CL_DEVICE_MAX_WORK_GROUP_SIZE should return a single size_t value (for example 512 or 1024). This is the maximum number of work-items in a work-group, not the maximum in each dimension. So in your case you are trying to make a 2D work-group with 32*32 = 1024 work-items, and presumably CL_DEVICE_MAX_WORK_GROUP_SIZE is less than 1024 on your system.

See the OpenCL 1.1 spec, table 4.3, page 37, the definition of CL_DEVICE_MAX_WORK_GROUP_SIZE.

You can work with OpengL and OpenCV in you iMX8.

Regards

0 Kudos

1,593 Views
guyorfreeman
Contributor I

Thanks Bio;

Perhaps my question was not clear enough, 

I am running an OpenCV 3.0 code (after verifying that an OpenCL code works on my IMX8M machine).


This OpenCV code (attached below) is using UMat instead of Mat which instructs the OpenCV layer to use T-API

and use OpenCL kernels for the implementation of the OpenCV function "goodFeaturesToTrack()".

The OpenCV - OpenCL implementation fires a total amount of work items in a manner that causes the OpenCl layer to violate the CL_DEVICE_MAX_WORK_GROUP_SIZE.

In my platform I found that CL_DEVICE_MAX_WORK_GROUP_SIZE is 1024 in total and on each dimension,

clGetDeviceInfo(devices[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(WorkSize), &WorkSize, NULL);


Now, while running the OpenCV code below, it is implementing the "goodFeaturesToTrack()" function on the OpenCL run-time, and OpenCV tries to queue the following CL kernels with wrong "globalsize" which results OpenCL errors:

OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: 

clEnqueueNDRangeKernel('sobel3', dims=2, globalsize=512x512x1, localsize=16x16x1)


OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: 
clEnqueueNDRangeKernel('MaxEigenVal', dims=1, globalsize=4096x1x1, localsize=1024x1x1)

My original question was:
1. Why OpenCV 3.0 which uses the T-API, is configuring the work items incorrectly so that it violates the CL_DEVICE_MAX_WORK_GROUP_SIZE=1024?

2. How can I set or limit the globalsize/localsize  that the OpenCV  is using when queuing CL kernels into the openCL run-time environment to prevent this behavior.


Regards,
Guy

My Program Code

==================

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
#include <stdlib.h>

using namespace cv;
using namespace std;

/// Global variables
Mat src, src_gray;
UMat usrc,usrc_gray;


int maxCorners = 500;
int maxTrackbar = 100;

RNG rng(12345);
char* source_window = "Image";

/// Function header
void goodFeaturesToTrack_Demo( int, void* );

/**
* @function main
*/
int main( int argc, char** argv )
{
/// Load source image and convert it to gray
//src = imread( argv[1], 1 );
imread( argv[1], 1 ).copyTo(usrc);
cvtColor( usrc, usrc_gray, CV_BGR2GRAY );


/// Create Window
namedWindow( source_window, CV_WINDOW_AUTOSIZE );

/// Create Trackbar to set the number of corners
createTrackbar( "Max corners:", source_window, &maxCorners, maxTrackbar, goodFeaturesToTrack_Demo );

imshow( source_window, usrc );

goodFeaturesToTrack_Demo( 0, 0 );

waitKey(0);
return(0);
}

/**
* @function goodFeaturesToTrack_Demo.cpp
* @brief Apply Shi-Tomasi corner detector
*/
void goodFeaturesToTrack_Demo( int, void* )
{
if( maxCorners < 1 ) { maxCorners = 1; }

/// Parameters for Shi-Tomasi algorithm
vector<Point2f> corners;
double qualityLevel = 0.01;
double minDistance = 10;
int blockSize = 3;
bool useHarrisDetector = false;
double k = 0.04;

/// Copy the source image
UMat ucopy;

ucopy = usrc.clone();

/// Apply corner detection
goodFeaturesToTrack( usrc_gray,
corners,
maxCorners,
qualityLevel,
minDistance,
Mat(),
blockSize,
useHarrisDetector,
k );


/// Draw corners detected
cout<<"** Number of corners detected: "<<corners.size()<<endl;
int r = 4;
for( int i = 0; i < corners.size(); i++ )
{ circle( ucopy, corners[i], r, Scalar(rng.uniform(0,255), rng.uniform(0,255),
rng.uniform(0,255)), -1, 8, 0 ); }

/// Show what you got
namedWindow( source_window, CV_WINDOW_AUTOSIZE );
imshow( source_window, ucopy );
}

0 Kudos

1,593 Views
jrestifo
Contributor I

I was wondering if you reached a resolution on this or have acquired any other info. I am working through the same problem on an i.MX8M based DART-MX8M SOM. OpenCL kernels are not being enqueued correctly with CL_INVALID_WORK_GROUP_SIZE as the error. I tried some of the suggestions here such as setting maxWorkGroupSize_, to no avail. I believe it is one of the other GPU constraints being violated but not sure which or how to determine it.

If I modify OpenCV to pass NULL as the localsize always, this error goes away, but that breaks kernels which expect a specific localsize so is not a general solution.

0 Kudos

1,593 Views
guyorfreeman
Contributor I

No, due to time constraints I went on looking for workarounds.

I tend to agree with you that it has to do with the wrong implementation/GPU constraints.

All the best 

0 Kudos