 
					
				
		
Hi,
Currently I am using the GPU present in the IMX6x processor to do open cl computation. The data which I work is on an image of size more than VGA resolution. The image needs to be copied to GPU memory and later sent back to host processor.
When measured for computation duration, the time taken for memory copy is almost taking the maximum band widht. The open cl kernel is not taking much bandwidth. So for our application, the memory transfer is the bottleneck.
Currently I am using opencl calls like, read_imageui and write_imageui.
Is there any smarter or better way of doing this memory copy? Does the GPU use some sort of DMA to do any copy to and from the host memory. Is this possible to make it background when the compution is happening?
Also is there any compution performance or memory bandwidth benchmark available?
Thanks & Regards,
TNS
Sri, I solved memory() in s/w bottleneck by using IPU unit as a 1:1 scale. It won't work if you don't specify at least one thing change in the destination so I chose to enable horizontal flip. You can undo it if you do 2 such copies. One to a temp buffer and a second one to your desired destination. If your input size buffer is more than 1024 pixels x 1024 lines you have to brake it into multiple steps of sizes not more than 1024x1024. Make sure that horizontal and vertical resolutions are divisible by 8.
I don't quite understand how you're using the IPU to DMA into the GPU memory. Are you still using clCreateImage2d? Is there a way to backdoor this and access an IPU buffer from CL with read/write_image?
It turns out that there is a solution in OpenGL. Vivante has a neat function glTexDirectVIVMap() that directly maps memory into GPU space so no copy operation is needed. I don't know if there is a version of the function in OpenCL but perhaps you could use OpenGL to accomplish your task instead?
Since the Vivante drivers are missing the cl_khr_gl_sharing extension there is no way (that I am aware of) to share textures between the two. I am not away of any equivalent call supported by the Vivante OpenCL drivers either.
This is an older question but since there is as of yet no response I would be curious to see if you ever got a response from Freescale on this. Specifically the write_imageui portion in our case.
