<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sporadic delay when using GPU with OpenCL in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912278#M137461</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you for your response.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I didn't change any input parameters for now.&lt;/P&gt;&lt;P&gt;If the execution time result was a signal, we could decompose it into 3 major components:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Consistent execution time: ~10µs ± 5µs&lt;/LI&gt;&lt;LI&gt;Small spikes: ~100µs ± 50µs&lt;/LI&gt;&lt;LI&gt;Big spikes: ~2000µs ± 1000µs&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;using clFinish in the following manner:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;clFinish(..);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;gettimeofday(&amp;amp;start, NULL);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;err = clEnqueueNDRangeKernel(..., &amp;amp;hEvent);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;clFinish(..);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;if(err != 0)&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;{ //..error&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;check.. /&lt;/SPAN&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;}&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;gettimeofday(&amp;amp;end, NULL);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Using clFinish(..) results mostly in elimintation of small spikes, those around ~100µs.&lt;/P&gt;&lt;P&gt;What remains is a mostly consistent signal with 15% big spikes at ~1000µs-1500µs.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="pastedImage_1.png"&gt;&lt;img src="https://community.nxp.com/t5/image/serverpage/image-id/87940i91B6C1E74B9F95FC/image-size/large?v=v2&amp;amp;px=999" role="button" title="pastedImage_1.png" alt="pastedImage_1.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Measuring exec. time with your "direct" measurement approach (in contrast to profiling information via cl_event "hEvent"),&lt;/P&gt;&lt;P&gt;results&amp;nbsp;in a similar picture of exec times, but with a higher percentage of "big spikes" and an offset of all values around ~500µs.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I will probably ignore the spikes for now since the max. exec. time is "only" 1.2ms, but if it stacks up with more complex functions i need to find additional solutions.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 23 Sep 2019 08:04:34 GMT</pubDate>
    <dc:creator>peter_eberl</dc:creator>
    <dc:date>2019-09-23T08:04:34Z</dc:date>
    <item>
      <title>Sporadic delay when using GPU with OpenCL</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912273#M137456</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I've got a imx8mqevk board and developing GPU processing applications with OpenCL.&lt;/P&gt;&lt;P&gt;Sporadicly I see big delays in executing kernels on the GPU.&lt;/P&gt;&lt;P&gt;Problem is also visable with the gtec-demo-framework. Executing the FastFourierTransform DemoApp I get in some runs such values:&lt;/P&gt;&lt;P&gt;Kernel execution time on GPU (kernel 0): 0.000003 seconds&lt;BR /&gt;Kernel execution time on GPU (kernel 1): 0.000615 seconds&lt;BR /&gt;Kernel execution time on GPU (kernel 2): 0.000002 seconds&lt;BR /&gt;Kernel execution time on GPU (kernel 3): 0.000002 seconds&lt;BR /&gt;Total Kernel execution time on GPU: 0.000622 seconds&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would expect such values:&lt;/P&gt;&lt;P&gt;Kernel execution time on GPU (kernel 0): 0.000003 seconds&lt;BR /&gt;Kernel execution time on GPU (kernel 1): 0.000001 seconds&lt;BR /&gt;Kernel execution time on GPU (kernel 2): 0.000002 seconds&lt;BR /&gt;Kernel execution time on GPU (kernel 3): 0.000002 seconds&lt;BR /&gt;Total Kernel execution time on GPU: 0.000008 seconds&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm using&amp;nbsp;&lt;/P&gt;&lt;P&gt;repo init -u &lt;A href="https://source.codeaurora.org/external/imx/imx-manifest" target="test_blank"&gt;https://source.codeaurora.org/external/imx/imx-manifest&lt;/A&gt; -b imx-linux-sumo -mimx-4.14.98-2.0.0_ga.xml&lt;/P&gt;&lt;P&gt;DISTRO=fsl-imx-xwayland MACHINE=imx8mqevk source fsl-setup-release.sh -b build-xwayland&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are there any problems within the imx-gpu-viv driver?&lt;/P&gt;&lt;P&gt;Or are there any other limitations?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 12 Aug 2019 13:07:43 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912273#M137456</guid>
      <dc:creator>peter_eberl</dc:creator>
      <dc:date>2019-08-12T13:07:43Z</dc:date>
    </item>
    <item>
      <title>Re: Sporadic delay when using GPU with OpenCL</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912274#M137457</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello peter,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Don´t know why you are getting limitations, but try to run again the code. On my Mx8M, I get 0.000008s.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 13 Aug 2019 16:49:35 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912274#M137457</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2019-08-13T16:49:35Z</dc:date>
    </item>
    <item>
      <title>Re: Sporadic delay when using GPU with OpenCL</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912275#M137458</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;A class="jx-jive-macro-user" href="https://community.nxp.com/people/Bio_TICFSL"&gt;Bio_TICFSL&lt;/A&gt;‌&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="pastedImage_5.png"&gt;&lt;img src="https://community.nxp.com/t5/image/serverpage/image-id/84377iAD342885D8F5DB83/image-size/large?v=v2&amp;amp;px=999" role="button" title="pastedImage_5.png" alt="pastedImage_5.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The graph shows the kernel execution times of a simple vector-vector additions with vector size 1000.&lt;/P&gt;&lt;P&gt;The addition was executed 1000 times.&lt;/P&gt;&lt;P&gt;My problem is the variance of values:&lt;/P&gt;&lt;P&gt;Min. exec. time: &amp;nbsp;2E-6 seconds&lt;/P&gt;&lt;P&gt;Max exec. time: &amp;nbsp;9.61E-4 seconds&lt;/P&gt;&lt;P&gt;Median:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;9E-5 seconds&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Even with 10E4 executions the max exec. times vary to around 2ms.&lt;/P&gt;&lt;P&gt;Execution Values are extracted through clGetEventProfilingInf() method (end-start timestamps).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Questions:&lt;/P&gt;&lt;P&gt;What causes this extreme variance of ~500%?&lt;/P&gt;&lt;P&gt;Is yocto linux a real-time operation system? -&amp;gt; Are possibly interrupts a cause of delay of enqueueing?&lt;/P&gt;&lt;P&gt;Am i measuring the timestamps correctly?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 12 Sep 2019 08:26:40 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912275#M137458</guid>
      <dc:creator>peter_eberl</dc:creator>
      <dc:date>2019-09-12T08:26:40Z</dc:date>
    </item>
    <item>
      <title>Re: Sporadic delay when using GPU with OpenCL</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912276#M137459</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;DIV class=""&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are you changing any of the arguments passed to cl kernel for each kernel execution? are the kernel parameters the same?&lt;/P&gt;&lt;P&gt;what do you get if not using wait for event when running the cl kernel like this:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;gettimeofday(&amp;amp;start, NULL);&amp;nbsp;&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; ret = clEnqueueNDRangeKernel (cq, kernel, dimension, NULL, global, local, 0, NULL, NULL);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; if&amp;nbsp; (ret == CL_SUCCESS)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; {&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; printf( "\nReading data from GPU memory = ..\n");&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; }else&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; {&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; printf( "\nKernel failed = ..\n");&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; }&amp;nbsp;&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; // Should be the barrier here?&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; clFinish(cq);&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;gettimeofday(&amp;amp;end, NULL);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; //compute and print the elapsed time in millisec - For writting data into input buffer&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; seconds&amp;nbsp; = end.tv_sec&amp;nbsp; - start.tv_sec;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; useconds = end.tv_usec - start.tv_usec;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; mtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp; printf( "\n CL code = %ld ms\n", mtime);&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Regards&lt;/STRONG&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 12 Sep 2019 18:55:53 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912276#M137459</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2019-09-12T18:55:53Z</dc:date>
    </item>
    <item>
      <title>Re: Sporadic delay when using GPU with OpenCL</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912277#M137460</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Also Please, add the following line:&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; clFinish (commandQueue);&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;in clutil.cpp line 498 (end of runKernelFFT function) before the close statement. It will make the result more consistent. In my case I dont have the MQ board to test but I tested on 8QXP with fft length = 32:&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kernel execution time on GPU (kernel 0) : 0.000546 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 1) : 0.000655 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 2) : 0.000177 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 3) : 0.000650 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 4) : 0.000623 seconds &lt;BR /&gt;Total Kernel execution time on GPU : 0.002651 seconds&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kernel execution time on GPU (kernel 0) : 0.000544 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 1) : 0.000657 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 2) : 0.000176 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 3) : 0.000640 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 4) : 0.000621 seconds &lt;BR /&gt;Total Kernel execution time on GPU : 0.002638 seconds&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kernel execution time on GPU (kernel 0) : 0.000509 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 1) : 0.000644 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 2) : 0.000173 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 3) : 0.000628 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 4) : 0.000620 seconds &lt;BR /&gt;Total Kernel execution time on GPU : 0.002574 seconds&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kernel execution time on GPU (kernel 0) : 0.000541 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 1) : 0.000634 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 2) : 0.000180 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 3) : 0.000640 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 4) : 0.000621 seconds &lt;BR /&gt;Total Kernel execution time on GPU : 0.002616 seconds&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kernel execution time on GPU (kernel 0) : 0.000541 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 1) : 0.000643 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 2) : 0.000183 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 3) : 0.000631 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 4) : 0.000617 seconds &lt;BR /&gt;Total Kernel execution time on GPU : 0.002615 seconds&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kernel execution time on GPU (kernel 0) : 0.000540 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 1) : 0.000660 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 2) : 0.000180 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 3) : 0.000653 seconds &lt;BR /&gt;Kernel execution time on GPU (kernel 4) : 0.000627 seconds &lt;BR /&gt;Total Kernel execution time on GPU : 0.002660 seconds&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;Regards&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 12 Sep 2019 18:57:17 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912277#M137460</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2019-09-12T18:57:17Z</dc:date>
    </item>
    <item>
      <title>Re: Sporadic delay when using GPU with OpenCL</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912278#M137461</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you for your response.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I didn't change any input parameters for now.&lt;/P&gt;&lt;P&gt;If the execution time result was a signal, we could decompose it into 3 major components:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Consistent execution time: ~10µs ± 5µs&lt;/LI&gt;&lt;LI&gt;Small spikes: ~100µs ± 50µs&lt;/LI&gt;&lt;LI&gt;Big spikes: ~2000µs ± 1000µs&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;using clFinish in the following manner:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;clFinish(..);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;gettimeofday(&amp;amp;start, NULL);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;err = clEnqueueNDRangeKernel(..., &amp;amp;hEvent);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;clFinish(..);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;if(err != 0)&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;{ //..error&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;check.. /&lt;/SPAN&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;}&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'courier new', courier, monospace;"&gt;gettimeofday(&amp;amp;end, NULL);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Using clFinish(..) results mostly in elimintation of small spikes, those around ~100µs.&lt;/P&gt;&lt;P&gt;What remains is a mostly consistent signal with 15% big spikes at ~1000µs-1500µs.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="pastedImage_1.png"&gt;&lt;img src="https://community.nxp.com/t5/image/serverpage/image-id/87940i91B6C1E74B9F95FC/image-size/large?v=v2&amp;amp;px=999" role="button" title="pastedImage_1.png" alt="pastedImage_1.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Measuring exec. time with your "direct" measurement approach (in contrast to profiling information via cl_event "hEvent"),&lt;/P&gt;&lt;P&gt;results&amp;nbsp;in a similar picture of exec times, but with a higher percentage of "big spikes" and an offset of all values around ~500µs.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I will probably ignore the spikes for now since the max. exec. time is "only" 1.2ms, but if it stacks up with more complex functions i need to find additional solutions.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 23 Sep 2019 08:04:34 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912278#M137461</guid>
      <dc:creator>peter_eberl</dc:creator>
      <dc:date>2019-09-23T08:04:34Z</dc:date>
    </item>
    <item>
      <title>Re: Sporadic delay when using GPU with OpenCL</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912279#M137462</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;DIV class=""&gt;&lt;P&gt;Hi, what is /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor on your board?&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I use performance&amp;nbsp;or ondemand for your scaling_governor&amp;nbsp; on my imx8qm board like this&amp;nbsp;&lt;/P&gt;&lt;P&gt;echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor&lt;/P&gt;&lt;P&gt;The execution time is quite consistent.&amp;nbsp;&lt;/P&gt;&lt;P style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My default scaling_governor is scheutil on my imx8qm board, yes, for many executions, I can see the the big spike for execution time from 1400 us to 2600us .&amp;nbsp; But, most of time, they are consistent around 1500us.&amp;nbsp; I think it is caused by cpu speed, not gpu. performance governor mode will make sure the consistent high cpu speed for cpu and gpu interactions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;/DIV&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 24 Sep 2019 20:42:42 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Sporadic-delay-when-using-GPU-with-OpenCL/m-p/912279#M137462</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2019-09-24T20:42:42Z</dc:date>
    </item>
  </channel>
</rss>

