<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: iMX8 MPlus - NNAPI Delegate on NPU has different accuracy/correctnes than on CPU in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1286107#M174901</link>
    <description>&lt;P&gt;Hello NeuralBlue,&lt;/P&gt;
&lt;P&gt;Please check the answer provided here:&lt;/P&gt;
&lt;P&gt;" CPU and NPU is different . While CPU uses 32bit registers, the NPU uses 16bit register for normalized multiplier and 48bit post multiplier output during quantized inference. This way, the CPU suffers from double rounding error, while NPU does not. "&lt;/P&gt;
&lt;P&gt;Therefore the output is not equal.&lt;/P&gt;
&lt;P&gt;Try to measure the overall accuracy difference of the model btw. CPU and NPU on larger dataset (not a single example). We did this kind of accuracy validation for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;PCQ&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;models, with following results:&lt;/P&gt;
&lt;DIV class="table-wrap"&gt;
&lt;TABLE class="confluenceTable"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;CPU (4 cores; TF Lite)&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;VSI NPU (TF Lite; NN API)&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;PCQ&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;PCQ&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;Accuracy&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;Accuracy&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;Top-1; Top-5&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;Top-1; Top-5&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;Mobilenet v1 1.0 224&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;70,80%; 88,20%&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;68,48%; 88,01%&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;Mobilenet v2 1.0 224&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;70,74%; 89,77%&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;70,75%; 89,75%&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;Efficientnet lite4 v2&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;77,30%; 94,00%&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;76,40%; 93,70%&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;Resnet v2 101 299&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;75,92%; 93,20%&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;76,25%; 93,31%&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;P&gt;The highest difference we see for Top1 prediction is 2.32% for Mobilenet v1 (top 5 is 0.19%) and 0.3 for top5 prediction (Efficientnet model) "&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ignore the python use-case, the behavior is due to t&lt;SPAN&gt;he HW precision btw CPU and NPU is different.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Let me know if further clarifications are needed.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Regards&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 02 Jun 2021 13:00:20 GMT</pubDate>
    <dc:creator>Bio_TICFSL</dc:creator>
    <dc:date>2021-06-02T13:00:20Z</dc:date>
    <item>
      <title>iMX8 MPlus - NNAPI Delegate on NPU has different accuracy/correctnes than on CPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1284374#M174691</link>
      <description>&lt;P&gt;We have a fully quantized (uint8) model to be run on iMX8MPlus.&lt;BR /&gt;&lt;BR /&gt;When run on CPU, the inference gives back exactly the same neural activations that we algebraically expect (exactly the same of training phase).&lt;/P&gt;&lt;P&gt;Instead, on the NPU, the inference (via NNAPI Delegate) gives different results with different activations and in some rare cases, gives completely incorrect activations.&lt;BR /&gt;&lt;BR /&gt;This is due (probably) to the accumulation of multiple internal approximations for some kind of operation(s). We obviously want that inference output on the NPU is exactly the same on the CPU and of training phase (on server). Any advice?&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Are there technical info about NPU and haw it handles int8, uint8 and the relative accumulations int8xint8 and uint8xuint8? ( already asked here:&amp;nbsp;&lt;A href="https://community.nxp.com/t5/i-MX-Processors/iMX-8M-Plus-NPU-info-and-Arm-Compute-Library/m-p/1241327#M170457" target="_self"&gt;https://community.nxp.com/t5/i-MX-Processors/iMX-8M-Plus-NPU-info-and-Arm-Compute-Library/m-p/1241327#M170457&lt;/A&gt;)&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;V.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 29 May 2021 13:38:12 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1284374#M174691</guid>
      <dc:creator>NeuralBlue</dc:creator>
      <dc:date>2021-05-29T13:38:12Z</dc:date>
    </item>
    <item>
      <title>Re: iMX8 MPlus - NNAPI Delegate on NPU has different accuracy/correctnes than on CPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1284417#M174698</link>
      <description>&lt;P&gt;As per page 12 of this manual dated 31 March 2021,&amp;nbsp;&lt;A href="https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf" target="_self"&gt;https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;you can easily benchmark&amp;nbsp;&lt;STRONG&gt;mobilenet_v1_1.0_224_quant.tflite &lt;/STRONG&gt;on CPU and on NPU (--use_nnapi=true)&lt;BR /&gt;&lt;BR /&gt;These are the result of inference:&lt;BR /&gt;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Question_NXP.PNG" style="width: 999px;"&gt;&lt;img src="https://community.nxp.com/t5/image/serverpage/image-id/145840iE4070114C6BA0B09/image-size/large?v=v2&amp;amp;px=999" role="button" title="Question_NXP.PNG" alt="Question_NXP.PNG" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;The CPU activations are the "correct" ones, obtained by&amp;nbsp;&lt;SPAN&gt;algebraically&amp;nbsp;doing the calculations on any other computing platform for the same input image. This means that some approx is introduced by NNAPI delagation or NPU itself. Considering that this is an already quantized model, this is not good.&lt;BR /&gt;&lt;BR /&gt;Hypotesis:&lt;BR /&gt;&lt;/SPAN&gt;- per-layer quantizations vs per-tensor quantizations ?&lt;BR /&gt;- asymm vs symm quantization?&lt;BR /&gt;- int8 &amp;lt;-&amp;gt; uint8&amp;nbsp; conversions?&lt;BR /&gt;&lt;BR /&gt;I kindly ask to NXP to clarify the source of the error in the calculations&amp;nbsp;&lt;STRONG&gt;mobilenet_v1_1.0_224_quant.tflite. &lt;/STRONG&gt;We can train better models then.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;V.&lt;/P&gt;</description>
      <pubDate>Sun, 30 May 2021 15:28:48 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1284417#M174698</guid>
      <dc:creator>NeuralBlue</dc:creator>
      <dc:date>2021-05-30T15:28:48Z</dc:date>
    </item>
    <item>
      <title>Re: iMX8 MPlus - NNAPI Delegate on NPU has different accuracy/correctnes than on CPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1284905#M174768</link>
      <description>&lt;P&gt;Making inferences with the said .tflite model by using (ArmNN + vsi_npu) instead of (TFLite + NNAPI) gives exactly the same (wrong) results.&lt;BR /&gt;This certainly means that there is a problem in the bottom blocks: NNRT or OVXLIB or OpenVX or Hardware.&lt;BR /&gt;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ArmNN-TFL.PNG" style="width: 305px;"&gt;&lt;img src="https://community.nxp.com/t5/image/serverpage/image-id/145904i6D348FD8FA9E4BC8/image-size/large?v=v2&amp;amp;px=999" role="button" title="ArmNN-TFL.PNG" alt="ArmNN-TFL.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 31 May 2021 18:34:46 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1284905#M174768</guid>
      <dc:creator>NeuralBlue</dc:creator>
      <dc:date>2021-05-31T18:34:46Z</dc:date>
    </item>
    <item>
      <title>Re: iMX8 MPlus - NNAPI Delegate on NPU has different accuracy/correctnes than on CPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1286107#M174901</link>
      <description>&lt;P&gt;Hello NeuralBlue,&lt;/P&gt;
&lt;P&gt;Please check the answer provided here:&lt;/P&gt;
&lt;P&gt;" CPU and NPU is different . While CPU uses 32bit registers, the NPU uses 16bit register for normalized multiplier and 48bit post multiplier output during quantized inference. This way, the CPU suffers from double rounding error, while NPU does not. "&lt;/P&gt;
&lt;P&gt;Therefore the output is not equal.&lt;/P&gt;
&lt;P&gt;Try to measure the overall accuracy difference of the model btw. CPU and NPU on larger dataset (not a single example). We did this kind of accuracy validation for&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;PCQ&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;models, with following results:&lt;/P&gt;
&lt;DIV class="table-wrap"&gt;
&lt;TABLE class="confluenceTable"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;CPU (4 cores; TF Lite)&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;VSI NPU (TF Lite; NN API)&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;PCQ&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;PCQ&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;Accuracy&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;Accuracy&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;Top-1; Top-5&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;Top-1; Top-5&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;Mobilenet v1 1.0 224&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;70,80%; 88,20%&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;68,48%; 88,01%&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;Mobilenet v2 1.0 224&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;70,74%; 89,77%&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;70,75%; 89,75%&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;Efficientnet lite4 v2&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;77,30%; 94,00%&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;76,40%; 93,70%&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="confluenceTd"&gt;Resnet v2 101 299&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;75,92%; 93,20%&lt;/TD&gt;
&lt;TD class="confluenceTd"&gt;76,25%; 93,31%&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;P&gt;The highest difference we see for Top1 prediction is 2.32% for Mobilenet v1 (top 5 is 0.19%) and 0.3 for top5 prediction (Efficientnet model) "&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ignore the python use-case, the behavior is due to t&lt;SPAN&gt;he HW precision btw CPU and NPU is different.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Let me know if further clarifications are needed.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Regards&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Jun 2021 13:00:20 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1286107#M174901</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2021-06-02T13:00:20Z</dc:date>
    </item>
    <item>
      <title>Re: iMX8 MPlus - NNAPI Delegate on NPU has different accuracy/correctnes than on CPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1286321#M174914</link>
      <description>&lt;P&gt;Hi Bio_TICFSL,&lt;BR /&gt;thank you for your answer.&lt;BR /&gt;&lt;BR /&gt;Our own neural net performs with a &amp;gt;99% TOP-1 Accuracy when executed on a CPU. We obviously use CPU/GPU during quant-aware training for loss calculation. &lt;STRONG&gt;It's critical to maintain the same &amp;gt;99% TOP-1 Accuracy on NPU. &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;In order to do this, we can consider (during training) to take account of the NPU extra precision and somehow simulate it, but we obviously need to understand very well how it works and how it can impact us.&amp;nbsp; If you have other methods in mind to make the &lt;STRONG&gt;NPU give us exactly the same results we expect from training&lt;/STRONG&gt;, please tell us (training on NPU? not so practical)&lt;BR /&gt;&lt;BR /&gt;Furthermore, is this rounding error the only source of difference CPU/NPU?&lt;BR /&gt;&lt;BR /&gt;Can you better explain this with an example?&lt;BR /&gt;"&lt;SPAN&gt;While CPU uses 32bit registers, the NPU uses 16bit register for normalized multiplier and 48bit post multiplier output during quantized inference. This way, the CPU suffers from double rounding error, while NPU does not. "&lt;BR /&gt;&lt;BR /&gt;Thank you very much, very useful.&lt;BR /&gt;Regards,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;NB&lt;/P&gt;</description>
      <pubDate>Wed, 02 Jun 2021 20:01:26 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1286321#M174914</guid>
      <dc:creator>NeuralBlue</dc:creator>
      <dc:date>2021-06-02T20:01:26Z</dc:date>
    </item>
    <item>
      <title>Re: iMX8 MPlus - NNAPI Delegate on NPU has different accuracy/correctnes than on CPU</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1287600#M175058</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;Currently we are not aware of another root cause for the difference in accuracy.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;I will check internally if we can share more details. &lt;/SPAN&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;Do you have NDA? is you have nda is better to have a internal ticket for it.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;regards&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Jun 2021 14:09:15 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/iMX8-MPlus-NNAPI-Delegate-on-NPU-has-different-accuracy/m-p/1287600#M175058</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2021-06-04T14:09:15Z</dc:date>
    </item>
  </channel>
</rss>

