Hi,
I'm trying to use OpenCL on a Sabrelite board based on iMx6 Q chip from element14. (soc reference : IMX6Q6AVT 10AC)
I have some weird troubles:
Max work items dimensions: 3
Max work items size[0]:1024
Max work items size[1]:1024
Max work items size[2]:1024
Max work group size:1024
But when I try to run a kernel I got different limitations. For instance, with a very simply kernel just running this operation:
a[get_local_id(0)] = b[get_local_id(0)] + c[get_local_id(0];
I'm not able to run more than 128 work items / work group and the global work size is limited to 65536.
Why am I not able to run 1024 work items? Why I'm limited to 65536 for the global work size?
I encounter the same problem before but with no solution!
When my local work size is invalid, I reduce it and so I get the message that my global work size is invalid, so I reduce the global work size and I get the message that my local work size is invalid. For instance:
LWS = [8, 8, 0] -> ok it's <= to 256
GWS = [256, 256, 0] -> ok it's <= 65536
------> INVALID GLOBAL WORK SIZE
LWS = [8, 8, 0] -> ok it's <= to 256
GWS = [128, 128, 0] -> ok it's <= 65536
------> INVALID LOCAL WORK SIZE
LWS = [4, 4, 0] -> ok it's <= to 256
GWS = [128, 128, 0] -> ok it's <= 65536
------> INVALID GLOBAL WORK SIZE
LWS = [1, 1, 0] -> ok it's <= to 256
GWS = [1, 1, 0] -> ok it's <= 65536
------> INVALID GLOBAL WORK SIZE
So I don't understand how I can configure the execution dimensions to be compliant with the hardware.
For information, when I trace work group info for this last kernel, I got this informations:
Work group size : 48
Compile work group size 0 : 0
Compile work group size 2 : 0
Compile work group size 3 : 0
Local mem size : 0
Prefered work group size multiple : 16
Private mem size : 0
Thanks for any help!
Etienne
Take a look at CL_KERNEL_WORK_GROUP_SIZE using clGetKernelWorkGroupInfo(). For Vivante GC2000, its 192. That is, for a particular kernel the max number of work items would be 192.