<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: eIQ inference performance issue with GPU in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/eIQ-inference-performance-issue-with-GPU/m-p/1385131#M184293</link>
    <description>&lt;P&gt;&lt;SPAN&gt;The tensor of NPU can't support float input/output.It can support&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN&gt;8/16-bit integer &lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;Tensor data format and s&lt;/SPAN&gt;&lt;SPAN&gt;upport 8, 16, 32-bit integer operations pipeline.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 11 Dec 2021 05:55:22 GMT</pubDate>
    <dc:creator>Zhiming_Liu</dc:creator>
    <dc:date>2021-12-11T05:55:22Z</dc:date>
    <item>
      <title>eIQ inference performance issue with GPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/eIQ-inference-performance-issue-with-GPU/m-p/1385030#M184275</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;We built our I.MX8MP target image and the SDK with yocto, using the&amp;nbsp;Linux 5.10.52_2.1.0 version.&lt;/P&gt;&lt;P&gt;We used the eIQ Armnn and Onnx Runtime inference engines to perform inference on one of our networks (fp32 data), exported in onnx format.&lt;/P&gt;&lt;P&gt;For inference with ArmNN, we slightly modified the&amp;nbsp;mnist_tf.cpp sample program to adapt it to our specific network. The result is functionally OK with the three available backends (CpuRef, CpuAcc and VsiNpu). For performances, the CpuRef is terribly slow which is expected. What is surprising is the&amp;nbsp;&amp;nbsp;VsiNpu backend which executes on GPU/NPU is 13 times slower than the&amp;nbsp;CpuAcc backend which executes on the Arm CPU with Neon. When using the&amp;nbsp;mnist_tf.cpp program with its original network (simple_mnist_tf.prototxt), the&amp;nbsp;VsiNpu backend is also slower.&amp;nbsp;&lt;/P&gt;&lt;P&gt;We also executed already made ArmNN "Onnx test" as described in the Machine Learning UG (&lt;A href="https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf" target="_blank"&gt;https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf&lt;/A&gt;) at §5.3.4. The&amp;nbsp;OnnxMobileNet-Armnn test is also more than 3 time slower with the&amp;nbsp;VsiNpu backend compared to the&amp;nbsp;CpuAcc&amp;nbsp; backend.&lt;/P&gt;&lt;P&gt;We did the same thing with the Onnx Runtime inference engine. The provided sample code (C_Api_Sample.cpp) running either on the original network (squeezenet in that case) or adapted to run our network, and observed the same beaviour : in all cases the backend targeting GPU / NPU is much slower than the backend targeting CPU with Neon.&lt;/P&gt;&lt;P&gt;In all cases, the inference is performed 2 times and the second is measured to account for the warmup time.&lt;/P&gt;&lt;P&gt;Tests with our model converted to use fp16 data performed with the Onnx Runtime inference engine (it seems ArmNN does not support it) show the same results, except the Nnapi backend also targeting GPU / NPU has the same performances as the CPU backends.&lt;/P&gt;&lt;P&gt;So, we observe the GPU / NPU backends at best perform like the CPU backends and are several times slower in the worst cases.&lt;/P&gt;&lt;P&gt;We would like to know what could be the reason for this behaviour.&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Dec 2021 18:19:44 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/eIQ-inference-performance-issue-with-GPU/m-p/1385030#M184275</guid>
      <dc:creator>mbrundler</dc:creator>
      <dc:date>2021-12-10T18:19:44Z</dc:date>
    </item>
    <item>
      <title>Re: eIQ inference performance issue with GPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/eIQ-inference-performance-issue-with-GPU/m-p/1385131#M184293</link>
      <description>&lt;P&gt;&lt;SPAN&gt;The tensor of NPU can't support float input/output.It can support&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN&gt;8/16-bit integer &lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;Tensor data format and s&lt;/SPAN&gt;&lt;SPAN&gt;upport 8, 16, 32-bit integer operations pipeline.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 11 Dec 2021 05:55:22 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/eIQ-inference-performance-issue-with-GPU/m-p/1385131#M184293</guid>
      <dc:creator>Zhiming_Liu</dc:creator>
      <dc:date>2021-12-11T05:55:22Z</dc:date>
    </item>
    <item>
      <title>Re: eIQ inference performance issue with GPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/eIQ-inference-performance-issue-with-GPU/m-p/1385629#M184353</link>
      <description>&lt;P&gt;So, the NPU is not efficient on floating-point calculations, but&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;would the GPU perform better ?&lt;/LI&gt;&lt;LI&gt;if yes, is there a way to request calculations be scheduled to the GPU ?&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 13 Dec 2021 11:25:44 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/eIQ-inference-performance-issue-with-GPU/m-p/1385629#M184353</guid>
      <dc:creator>mbrundler</dc:creator>
      <dc:date>2021-12-13T11:25:44Z</dc:date>
    </item>
  </channel>
</rss>

