Hello,
I'm trying the same goodfeaturestoTrack Opencv 4.5 function using Opencl in Vivante 7000 GPU on IMX8MPlus, and I'm facing the different error,
CL_LINK_PROGRAM_FAILURE. Is this a similar problem, or does this require something else to be done.
Hi Guy,
CL_DEVICE_MAX_WORK_GROUP_SIZE
should return a single size_t
value (for example 512 or 1024). This is the maximum number of work-items in a work-group, not the maximum in each dimension. So in your case you are trying to make a 2D work-group with 32*32 = 1024 work-items, and presumably CL_DEVICE_MAX_WORK_GROUP_SIZE
is less than 1024 on your system.
See the OpenCL 1.1 spec, table 4.3, page 37, the definition of CL_DEVICE_MAX_WORK_GROUP_SIZE.
You can work with OpengL and OpenCV in you iMX8.
Regards
Thanks Bio;
Perhaps my question was not clear enough,
I am running an OpenCV 3.0 code (after verifying that an OpenCL code works on my IMX8M machine).
This OpenCV code (attached below) is using UMat instead of Mat which instructs the OpenCV layer to use T-API
and use OpenCL kernels for the implementation of the OpenCV function "goodFeaturesToTrack()".
The OpenCV - OpenCL implementation fires a total amount of work items in a manner that causes the OpenCl layer to violate the CL_DEVICE_MAX_WORK_GROUP_SIZE
.
In my platform I found that CL_DEVICE_MAX_WORK_GROUP_SIZE
is 1024 in total and on each dimension,
clGetDeviceInfo(devices[j], CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(WorkSize), &WorkSize, NULL);
Now, while running the OpenCV code below, it is implementing the "goodFeaturesToTrack()" function on the OpenCL run-time, and OpenCV tries to queue the following CL kernels with wrong "globalsize" which results OpenCL errors:
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call:
clEnqueueNDRangeKernel('sobel3', dims=2, globalsize=512x512x1, localsize=16x16x1)
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call:
clEnqueueNDRangeKernel('MaxEigenVal', dims=1, globalsize=4096x1x1, localsize=1024x1x1)
My original question was:
1. Why OpenCV 3.0 which uses the T-API, is configuring the work items incorrectly so that it violates the CL_DEVICE_MAX_WORK_GROUP_SIZE=1024
?
2. How can I set or limit the globalsize/localsize that the OpenCV is using when queuing CL kernels into the openCL run-time environment to prevent this behavior.
Regards,
Guy
My Program Code
==================
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
using namespace cv;
using namespace std;
/// Global variables
Mat src, src_gray;
UMat usrc,usrc_gray;
int maxCorners = 500;
int maxTrackbar = 100;
RNG rng(12345);
char* source_window = "Image";
/// Function header
void goodFeaturesToTrack_Demo( int, void* );
/**
* @function main
*/
int main( int argc, char** argv )
{
/// Load source image and convert it to gray
//src = imread( argv[1], 1 );
imread( argv[1], 1 ).copyTo(usrc);
cvtColor( usrc, usrc_gray, CV_BGR2GRAY );
/// Create Window
namedWindow( source_window, CV_WINDOW_AUTOSIZE );
/// Create Trackbar to set the number of corners
createTrackbar( "Max corners:", source_window, &maxCorners, maxTrackbar, goodFeaturesToTrack_Demo );
imshow( source_window, usrc );
goodFeaturesToTrack_Demo( 0, 0 );
waitKey(0);
return(0);
}
/**
* @function goodFeaturesToTrack_Demo.cpp
* @brief Apply Shi-Tomasi corner detector
*/
void goodFeaturesToTrack_Demo( int, void* )
{
if( maxCorners < 1 ) { maxCorners = 1; }
/// Parameters for Shi-Tomasi algorithm
vector<Point2f> corners;
double qualityLevel = 0.01;
double minDistance = 10;
int blockSize = 3;
bool useHarrisDetector = false;
double k = 0.04;
/// Copy the source image
UMat ucopy;
ucopy = usrc.clone();
/// Apply corner detection
goodFeaturesToTrack( usrc_gray,
corners,
maxCorners,
qualityLevel,
minDistance,
Mat(),
blockSize,
useHarrisDetector,
k );
/// Draw corners detected
cout<<"** Number of corners detected: "<<corners.size()<<endl;
int r = 4;
for( int i = 0; i < corners.size(); i++ )
{ circle( ucopy, corners[i], r, Scalar(rng.uniform(0,255), rng.uniform(0,255),
rng.uniform(0,255)), -1, 8, 0 ); }
/// Show what you got
namedWindow( source_window, CV_WINDOW_AUTOSIZE );
imshow( source_window, ucopy );
}
I was wondering if you reached a resolution on this or have acquired any other info. I am working through the same problem on an i.MX8M based DART-MX8M SOM. OpenCL kernels are not being enqueued correctly with CL_INVALID_WORK_GROUP_SIZE as the error. I tried some of the suggestions here such as setting maxWorkGroupSize_, to no avail. I believe it is one of the other GPU constraints being violated but not sure which or how to determine it.
If I modify OpenCV to pass NULL as the localsize always, this error goes away, but that breaks kernels which expect a specific localsize so is not a general solution.
No, due to time constraints I went on looking for workarounds.
I tend to agree with you that it has to do with the wrong implementation/GPU constraints.
All the best