<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: iMX8M Plus: onnxruntime_perf_test is slower on NPU than CPU in eIQ Machine Learning Software</title>
    <link>https://community.nxp.com/t5/eIQ-Machine-Learning-Software/iMX8M-Plus-onnxruntime-perf-test-is-slower-on-NPU-than-CPU/m-p/1418486#M592</link>
    <description>&lt;P&gt;The reason why you're observing this behavior is because the model you're running on the NPU is an FP32 model. This you can verify by loading the ONNX model on Netron. The NPU is designed for accelerated inference on INT8. Therefore, what you see is actually an expected behavior. What you need to do is to quantize the FP32 model, and then deploy it on the NPU as the example suggests. Then you will see improved performance.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would suggest you take a look at the following example from the ONNXRuntime github repo:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/notebooks/imagenet_v2/mobilenet.ipynb" target="_blank" rel="noopener"&gt;https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/notebooks/imagenet_v2/mobilenet.ipynb&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;There they show how to go from a Pytorch MobileNetV2 FP32 model to a quantized ONNX model. Then you can take the output model and run it on the imx8 NPU&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hope this helps!&lt;/P&gt;</description>
    <pubDate>Wed, 23 Feb 2022 17:45:32 GMT</pubDate>
    <dc:creator>HiramRTR</dc:creator>
    <dc:date>2022-02-23T17:45:32Z</dc:date>
    <item>
      <title>iMX8M Plus: onnxruntime_perf_test is slower on NPU than CPU</title>
      <link>https://community.nxp.com/t5/eIQ-Machine-Learning-Software/iMX8M-Plus-onnxruntime-perf-test-is-slower-on-NPU-than-CPU/m-p/1412878#M588</link>
      <description>&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;iMX8M Plus: onnxruntime_perf_test is slower on NPU than CPU&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;Hi all,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;I have the 8MPLUSLPD4-EVK Evaluation Kit and I am trying onnxruntime_perf_test according to &lt;/SPAN&gt;&lt;SPAN&gt;"i.MX Machine Learning User's Guide, Rev. LF5.10.72_2.2.0, 17"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;But onnxruntime_perf_test is slower on NPU than CPU.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;i.MX Yocto Project(hardknott-5.10.72-2.2.0) is running on EVK.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;STRONG&gt;Running on NPU&lt;/STRONG&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;PRE&gt;&lt;SPAN&gt;/usr/bin/onnxruntime-1.8.2/onnxruntime_perf_test /usr/bin/onnxruntime-1.8.2/squeezenet/model.onnx -r 1 -e vsi_npu &lt;/SPAN&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;PRE&gt;&lt;SPAN&gt;Session creation time cost: 0.126173 s&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Total time cost (including warm-up): 1.1651 s&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Total inference requests: 2&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Warm-up inference time cost: 744.977 ms&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Average inference time cost (excluding warm-up): 420.121 ms&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Total inference run time: 0.420148 s&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Avg CPU usage: 0 %&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Peak working set size: 81121280 bytes&lt;/SPAN&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;STRONG&gt;Running on CPU&lt;/STRONG&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;PRE&gt;&lt;SPAN&gt;/usr/bin/onnxruntime-1.8.2/onnxruntime_perf_test /usr/bin/onnxruntime-1.8.2/squeezenet/model.onnx -r 1 -e cpu &lt;/SPAN&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;PRE&gt;&lt;SPAN&gt;Session creation time cost: 0.0570905 s&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Total time cost (including warm-up): 0.11501 s&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Total inference requests: 2&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Warm-up inference time cost: 58.0624 ms&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Average inference time cost (excluding warm-up): 56.9481 ms&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Total inference run time: 0.0569692 s&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Avg CPU usage: 91 %&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Peak working set size: 46661632 bytes&lt;/SPAN&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;Is this correct?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;I have attached onnxruntime_perf_test -v option log.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 14 Feb 2022 06:18:01 GMT</pubDate>
      <guid>https://community.nxp.com/t5/eIQ-Machine-Learning-Software/iMX8M-Plus-onnxruntime-perf-test-is-slower-on-NPU-than-CPU/m-p/1412878#M588</guid>
      <dc:creator>makotosato</dc:creator>
      <dc:date>2022-02-14T06:18:01Z</dc:date>
    </item>
    <item>
      <title>Re: iMX8M Plus: onnxruntime_perf_test is slower on NPU than CPU</title>
      <link>https://community.nxp.com/t5/eIQ-Machine-Learning-Software/iMX8M-Plus-onnxruntime-perf-test-is-slower-on-NPU-than-CPU/m-p/1418486#M592</link>
      <description>&lt;P&gt;The reason why you're observing this behavior is because the model you're running on the NPU is an FP32 model. This you can verify by loading the ONNX model on Netron. The NPU is designed for accelerated inference on INT8. Therefore, what you see is actually an expected behavior. What you need to do is to quantize the FP32 model, and then deploy it on the NPU as the example suggests. Then you will see improved performance.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would suggest you take a look at the following example from the ONNXRuntime github repo:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/notebooks/imagenet_v2/mobilenet.ipynb" target="_blank" rel="noopener"&gt;https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/notebooks/imagenet_v2/mobilenet.ipynb&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;There they show how to go from a Pytorch MobileNetV2 FP32 model to a quantized ONNX model. Then you can take the output model and run it on the imx8 NPU&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hope this helps!&lt;/P&gt;</description>
      <pubDate>Wed, 23 Feb 2022 17:45:32 GMT</pubDate>
      <guid>https://community.nxp.com/t5/eIQ-Machine-Learning-Software/iMX8M-Plus-onnxruntime-perf-test-is-slower-on-NPU-than-CPU/m-p/1418486#M592</guid>
      <dc:creator>HiramRTR</dc:creator>
      <dc:date>2022-02-23T17:45:32Z</dc:date>
    </item>
    <item>
      <title>Re: iMX8M Plus: onnxruntime_perf_test is slower on NPU than CPU</title>
      <link>https://community.nxp.com/t5/eIQ-Machine-Learning-Software/iMX8M-Plus-onnxruntime-perf-test-is-slower-on-NPU-than-CPU/m-p/1419551#M593</link>
      <description>&lt;P&gt;Thank you very much.&lt;BR /&gt;I will try that example.&lt;/P&gt;&lt;P&gt;And, I think you should update the Machine Learning User's Guide.&lt;/P&gt;&lt;P&gt;Best regards.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Feb 2022 06:19:07 GMT</pubDate>
      <guid>https://community.nxp.com/t5/eIQ-Machine-Learning-Software/iMX8M-Plus-onnxruntime-perf-test-is-slower-on-NPU-than-CPU/m-p/1419551#M593</guid>
      <dc:creator>makotosato</dc:creator>
      <dc:date>2022-02-25T06:19:07Z</dc:date>
    </item>
  </channel>
</rss>

