AnsweredAssumed Answered

i.MX6 OpenCL Zero Copy Buffer Usage is Slow

Question asked by Stephen Rhein on Aug 13, 2015

Issue description:

  1. Reference Board: SMARC-sAMX6i
  2. Kernel version: 3.10.17-rel1.0+g232293e
  3. Problem Description: using CL_MEM_ALLOC_HOST_PTR, clEnqueueMapBuffer and clEnqueueUnmapMemObject properly should allow fast sharing of data between host and device, but instead it provides very slow access to the mapped memory.
  4. Intel provides some examples of using zero copy buffers in OpenCL: Getting the Most from OpenCL™ 1.2: How to Increase Performance by Minimizing Buffer Copies on Intel® Processor Graphics …
  5. always reproducible when using CL_MEM_ALLOC_HOST_PTR, clEnqueueMapBuffer and clEnqueueUnmapMemObject


I am noticing terrible performance using zero copy buffers on the i.MX6 as can be seen in the following benchmarks:


# host and device use separate buffers

NEON   framerate : 100.069185

OpenCL framerate : 751.673579


# host and device use same buffer

NEON   framerate : 40.988317

OpenCL framerate : 48.976948


I believe that the host pointer allocated by the Vivante driver is uncached and causing the terrible performance we are seeing.  This is highly unfortunate in the case where CPU and GPU shared memory could be leveraged for greater performance.