<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Why GPU taking more time than CPU for inference ? in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/Why-GPU-taking-more-time-than-CPU-for-inference/m-p/1445147#M189435</link>
    <description>&lt;P&gt;Image :&amp;nbsp;LF_v5.15.5-1.0.0_images_IMX8MQEVK&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp; &lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/34846"&gt;@Bio_TICFSL&lt;/a&gt;, It seems that the GPU is underperforming while I am running the prebuilt model file, when I run this same thing on the CPU, it gives a much faster result. Below is the mentioned average time for CPU and GPU.&lt;/P&gt;&lt;P&gt;For CPU&amp;nbsp; ==&amp;gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thread = 1:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 1
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 179.697 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thread = 2:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 2
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 92.645 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thread = 3 :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 3
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 64.785 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thread = 4 :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 4
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 48.975 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For GPU ==&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -a 1
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: Created TensorFlow Lite delegate for NNAPI.
NNAPI delegate created.
INFO: Applied NNAPI delegate.
W [query_hardware_caps:71]Unsupported evis version
INFO: invoked
INFO: average time: 103.217 ms
INFO: 0.784314: 653 military uniform
INFO: 0.105882: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.00784314: 466 bulletproof vest
INFO: 0.00392157: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For any number of threads &amp;gt; 1, GPU is slower than CPU. Is there a way to accelerate GPU inference time so that it is faster than the CPU?&lt;/P&gt;</description>
    <pubDate>Thu, 21 Apr 2022 06:14:09 GMT</pubDate>
    <dc:creator>Swapnil_Shah</dc:creator>
    <dc:date>2022-04-21T06:14:09Z</dc:date>
    <item>
      <title>Why GPU taking more time than CPU for inference ?</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/Why-GPU-taking-more-time-than-CPU-for-inference/m-p/1445147#M189435</link>
      <description>&lt;P&gt;Image :&amp;nbsp;LF_v5.15.5-1.0.0_images_IMX8MQEVK&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp; &lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/34846"&gt;@Bio_TICFSL&lt;/a&gt;, It seems that the GPU is underperforming while I am running the prebuilt model file, when I run this same thing on the CPU, it gives a much faster result. Below is the mentioned average time for CPU and GPU.&lt;/P&gt;&lt;P&gt;For CPU&amp;nbsp; ==&amp;gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thread = 1:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 1
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 179.697 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thread = 2:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 2
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 92.645 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thread = 3 :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 3
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 64.785 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thread = 4 :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -t 4
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: invoked
INFO: average time: 48.975 ms
INFO: 0.764706: 653 military uniform
INFO: 0.121569: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.0117647: 466 bulletproof vest
INFO: 0.00784314: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For GPU ==&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;./label_image -i grace_hoopper.bmp -l lables.txt -m mobilenet_v1_1.0_224_quant.tflite -a 1
INFO: Loaded model mobilenet_v1_1.0_224_quant.tflite
INFO: resolved reporter
INFO: Created TensorFlow Lite delegate for NNAPI.
NNAPI delegate created.
INFO: Applied NNAPI delegate.
W [query_hardware_caps:71]Unsupported evis version
INFO: invoked
INFO: average time: 103.217 ms
INFO: 0.784314: 653 military uniform
INFO: 0.105882: 907 Windsor tie
INFO: 0.0156863: 458 bow tie
INFO: 0.00784314: 466 bulletproof vest
INFO: 0.00392157: 835 suit&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For any number of threads &amp;gt; 1, GPU is slower than CPU. Is there a way to accelerate GPU inference time so that it is faster than the CPU?&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2022 06:14:09 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/Why-GPU-taking-more-time-than-CPU-for-inference/m-p/1445147#M189435</guid>
      <dc:creator>Swapnil_Shah</dc:creator>
      <dc:date>2022-04-21T06:14:09Z</dc:date>
    </item>
  </channel>
</rss>

