<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: V4L2 capture buffer performance in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/V4L2-capture-buffer-performance/m-p/263299#M26895</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hey Philip,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Did you figured out what is the problem or what's happening? I also tried your code and it gives the same result on a iMX6 processor. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 30 Jan 2015 11:56:44 GMT</pubDate>
    <dc:creator>vladspiridonesc</dc:creator>
    <dc:date>2015-01-30T11:56:44Z</dc:date>
    <item>
      <title>V4L2 capture buffer performance</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/V4L2-capture-buffer-performance/m-p/263298#M26894</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I'm seeing some strange performance behaviour when processing images captured with V4L2.&amp;nbsp; I've reduced it to the following test case (I've attached the full source code):&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;int process_image(const unsigned char *p, int size)&lt;/P&gt;&lt;P&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int i, j, sum;&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (do_copy) {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; memcpy(copy_buf, p, size);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; p = copy_buf;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; sum = 0;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (do_rows) {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (i = 0; i &amp;lt; size; i++)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sum |= p[i];&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; } else {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* Read all the pixels in non-optimal order */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (i = 0; i &amp;lt; fmt.fmt.pix.bytesperline; i++) {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (j = 0; j &amp;lt; fmt.fmt.pix.height; j++) {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sum |= p[i + j * fmt.fmt.pix.bytesperline];&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm measuring the performance using 'perf stat -e cpu-clock ./capture-test -c 100', running on a Boundary Device SABRE Lite, with an OV5642 sensor. The kernel is the Boundary devices kernel from the dora branch of Yocto.&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's the results I get:&lt;/P&gt;&lt;P&gt;do_copy=0, do_rows=0: 1458.861001 cpu-clock&lt;/P&gt;&lt;P&gt;do_copy=0, do_rows=1: 2207.192003 cpu-clock&lt;/P&gt;&lt;P&gt;do_copy=1, do_rows=0: 624.319663 cpu-clock&lt;/P&gt;&lt;P&gt;do_copy=1, do_rows=1: 461.399335 cpu-clock&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There's two strange things about these results. First, doing a memcpy makes things a lot faster. Second, when not doing a memcpy, row-wise traversal is much slower.&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does anyone know why this is happening, and how I can make it faster without doing a memcpy?&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The kernel is allocating the buffers with dma_alloc_coherent. I've tried changing to kmalloc and dma_map_single/dma_unmap_single in the QBUF/DQBUF ioctls, but that made no difference.&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also note that if I run the same test code on my laptop, then I get the expected behaviour (row-wise traversal is faster, and memcpy is slower).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Original Attachment has been moved to: &lt;A _jive_internal="true" href="https://community.nxp.com/docs/DOC-335828"&gt;capture-test.c.zip&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 16 Dec 2013 03:45:28 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/V4L2-capture-buffer-performance/m-p/263298#M26894</guid>
      <dc:creator>philipcraig</dc:creator>
      <dc:date>2013-12-16T03:45:28Z</dc:date>
    </item>
    <item>
      <title>Re: V4L2 capture buffer performance</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/V4L2-capture-buffer-performance/m-p/263299#M26895</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hey Philip,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Did you figured out what is the problem or what's happening? I also tried your code and it gives the same result on a iMX6 processor. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 30 Jan 2015 11:56:44 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/V4L2-capture-buffer-performance/m-p/263299#M26895</guid>
      <dc:creator>vladspiridonesc</dc:creator>
      <dc:date>2015-01-30T11:56:44Z</dc:date>
    </item>
    <item>
      <title>Re: V4L2 capture buffer performance</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/V4L2-capture-buffer-performance/m-p/263300#M26896</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;V4L2 buffers allocated with V4L2_MEMORY_MMAP are not cachable.&lt;/P&gt;&lt;P&gt;CPU accesses to such buffers are very slow (every access goes all the way to DDR).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You should use V4L2_MEMORY_USERPTR ...&amp;nbsp;&amp;nbsp; But, I couldn't make it work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Were you able to resolve the issue eventually?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Erez&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 20 Jul 2015 13:12:46 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/V4L2-capture-buffer-performance/m-p/263300#M26896</guid>
      <dc:creator>erezsteinberg</dc:creator>
      <dc:date>2015-07-20T13:12:46Z</dc:date>
    </item>
    <item>
      <title>Re: V4L2 capture buffer performance</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/V4L2-capture-buffer-performance/m-p/263301#M26897</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;did someone got a solution?&lt;BR /&gt;V4L2_MEMORY_MMAP very slow memcpy &lt;/P&gt;&lt;P&gt;V4L2_MEMORY_USERPTR don't work even if i allocate memory with memalign(page_size, framesize);&lt;BR /&gt;I'm using imx6ull. and mx6s_capture module&lt;BR /&gt;yes, i know it is old thread...&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 24 Aug 2018 14:24:47 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/V4L2-capture-buffer-performance/m-p/263301#M26897</guid>
      <dc:creator>firex</dc:creator>
      <dc:date>2018-08-24T14:24:47Z</dc:date>
    </item>
  </channel>
</rss>

