V4L2 capture buffer performance

Showing results for 
Search instead for 
Did you mean: 

V4L2 capture buffer performance

Contributor II

I'm seeing some strange performance behaviour when processing images captured with V4L2.  I've reduced it to the following test case (I've attached the full source code):


int process_image(const unsigned char *p, int size)


    int i, j, sum;


    if (do_copy) {

        memcpy(copy_buf, p, size);

        p = copy_buf;



    sum = 0;

    if (do_rows) {

        for (i = 0; i < size; i++)

            sum |= p[i];

    } else {

        /* Read all the pixels in non-optimal order */

        for (i = 0; i < fmt.fmt.pix.bytesperline; i++) {

            for (j = 0; j < fmt.fmt.pix.height; j++) {

                sum |= p[i + j * fmt.fmt.pix.bytesperline];






I'm measuring the performance using 'perf stat -e cpu-clock ./capture-test -c 100', running on a Boundary Device SABRE Lite, with an OV5642 sensor. The kernel is the Boundary devices kernel from the dora branch of Yocto.


Here's the results I get:

do_copy=0, do_rows=0: 1458.861001 cpu-clock

do_copy=0, do_rows=1: 2207.192003 cpu-clock

do_copy=1, do_rows=0: 624.319663 cpu-clock

do_copy=1, do_rows=1: 461.399335 cpu-clock


There's two strange things about these results. First, doing a memcpy makes things a lot faster. Second, when not doing a memcpy, row-wise traversal is much slower.


Does anyone know why this is happening, and how I can make it faster without doing a memcpy?


The kernel is allocating the buffers with dma_alloc_coherent. I've tried changing to kmalloc and dma_map_single/dma_unmap_single in the QBUF/DQBUF ioctls, but that made no difference.


Also note that if I run the same test code on my laptop, then I get the expected behaviour (row-wise traversal is faster, and memcpy is slower).

Original Attachment has been moved to: capture-test.c.zip

Labels (1)
0 Kudos
3 Replies

Contributor IV

V4L2 buffers allocated with V4L2_MEMORY_MMAP are not cachable.

CPU accesses to such buffers are very slow (every access goes all the way to DDR).

You should use V4L2_MEMORY_USERPTR ...   But, I couldn't make it work.

Were you able to resolve the issue eventually?



0 Kudos

Contributor I

did someone got a solution?
V4L2_MEMORY_MMAP very slow memcpy

V4L2_MEMORY_USERPTR don't work even if i allocate memory with memalign(page_size, framesize);
I'm using imx6ull. and mx6s_capture module
yes, i know it is old thread...

0 Kudos

Contributor II

Hey Philip,

Did you figured out what is the problem or what's happening? I also tried your code and it gives the same result on a iMX6 processor.

0 Kudos