Currently I am using the GPU present in the IMX6x processor to do open cl computation. The data which I work is on an image of size more than VGA resolution. The image needs to be copied to GPU memory and later sent back to host processor.
When measured for computation duration, the time taken for memory copy is almost taking the maximum band widht. The open cl kernel is not taking much bandwidth. So for our application, the memory transfer is the bottleneck.
Currently I am using opencl calls like, read_imageui and write_imageui.
Is there any smarter or better way of doing this memory copy? Does the GPU use some sort of DMA to do any copy to and from the host memory. Is this possible to make it background when the compution is happening?
Also is there any compution performance or memory bandwidth benchmark available?
Thanks & Regards,