hi,
I was trying out a simple program in OpenCL which involved assigning the values of one memory object to another using a kernel code. However this only is taking around 10 to 12 ms. Also reading back the data using zero copy mechanism is taking at least 15 ms. Is there a way to reduce these timings?
P.S. The size of the image buffer is 1024x1024
Hi Anusree,
Are you using GPU vectors for initialize the CPU memory or GPU memory?, also you can copy the output in GPU memory to CPU memory with command clEnqueueReadBuffer, this should improve the performance.
In any case you are using atomic functions, these are currently are disable from OpenCL compiler, . If the kernel use it, the compiler will prompt error.
Examples can be found on the Graphics reference manual on the BSP documentation.
Hope this helps