I have been experimenting with openCL support on the vivante GPU of the i.MX6 (SABRE Lite board). Many cases work just fine, but whenever I try to use local memory in a kernel (i.e. __local keyword), the computation output is incorrect. The exact same kernel with the __local keyword removed, and the output is correct.
I do know that the GC2000 GPU does not physically have dedicated local memory, and that local memory is therefore stored in global memory. So the advantage of using it is probably non-existent. But still, at least for compatibility reasons, I would expect the use of local memory to produce a functionally-correct output...
Did anyone get correct results in a similar situation ?