we are using a dual-core i.MX6 in an application based on computer vision. We
receive a stream of images from a camera, with a frame rate of about 14 fps.
Each frame is 2592 x 1944 (5 megapixels), with 8 bits per pixel.
We need to process those images to detect the edges.
Please notice that our application is not concerned with the visualization of
those images (actually we do not even have any display connected to our custom
board). We have the frames loaded into main memory by the IPU, and we have to
process them to obtain the edge points represented in main memory too (as a list
of points, or as a binary image).
Performances are important: using a standard Canny edge detector with the OpenCV
libraries, processing a 5Mpx-8bpp-frame takes about 500 ms, which is far too
long. We have to use the GPU horse-power to boost the edge detection.
This should be a typical "GPGPU" application, so OpenCL seems to be the right
way. Unfortunately, our first tries with it were not so good: a simple 3x3
gaussian-blur filter --which is the first step for the canny algorithm-- appears
to take about 240 ms.
So our question is: how can we optimize the OpenCL in order to achieve better
performances? Those should be possible, given that Vivante GC2000 is assigned a
bandwidth of 16 Gflops, and a 3x3 gaussian filter on 5 MB should take more or
less 50 millions operations.
I attach some OpenCL code we are using.
Being this issue crucial for our application, any help or hint on it would be
greatly appreciated :-)
Original Attachment has been moved to: conv05.c.zip
Original Attachment has been moved to: conv05.cl.zip
Original Attachment has been moved to: Makefile.zip