I am using an i.MX8M based platform with Vivante GC7000Lite GPU. (DART-MX8M from Variscite) I am trying to run OpenCV example code which is OpenCL accelerated but I am not getting what I expected, and wonder if there is some piece of info I am missing. For instance, if I try the opencv_test_optflow:
$ /usr/bin/opencv_test_optflow
[ WARN:0] Using world accessible cache directory. This may be not secure: /var/tmp/
[ INFO:0] Successfully initialized OpenCL cache directory: /var/tmp/opencv/3.4.0/opencl_cache/
[ INFO:0] Preparing OpenCL cache configuration for context: 32-bit--Vivante_Corporation--Vivante_OpenCL_Device_GC7000L_6214_0000--OpenCL_1_2_V6_2_4_p1_150331
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('sobel3', dims=2, globalsize=592x400x1, localsize=16x16x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('maxEigenVal', dims=1, globalsize=4096x1x1, localsize=1024x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=768x97x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=768x97x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=512x49x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=512x49x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=256x25x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=256x25x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=768x97x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=768x97x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=512x49x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=512x49x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=256x25x1, localsize=256x1x1) sync=false
OpenCL error CL_INVALID_WORK_GROUP_SIZE (-54) during call: clEnqueueNDRangeKernel('pyrDown', dims=2, globalsize=256x25x1, localsize=256x1x1) sync=false
/usr/src/debug/opencv/3.4+gitAUTOINC+6d4f66472e_81ca8dab86_a62e20676a_34e4206aef_fccf7cd6a4-r0/contrib/modules/optflow/test/test_OF_accuracy.cpp:236: Failure
Expected: (calcRMSE(GT, flow)) <= (target_RMSE), actual: 0.5805 vs 0.55
[ FAILED ] DenseOpticalFlow_PCAFlow.ReferenceAccuracy (65669 ms)
[----------] 1 test from DenseOpticalFlow_PCAFlow (65669 ms total)
What I do not understand is why CL_INVALID_WORK_GROUP_SIZE is returned. The localsize divides evenly into global size. It is less than global size. It is less than 1024 (max work group size). Why does it fail? If I pass NULL as localsize to clEnqueueNDRangeKernel it works, but this is not a general solution since I would need to hand modify all kernels which rely on a known localsize. It seems to me a potential bug in the underlying platform, either in the GPU itself or perhaps the BSP. However, the libVivanteOpenCL appears to be closed source so I cannot really investigate further. Please advise, thank you!