<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic i.mx8m plus npu failure log in i.MX Processors</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1260632#M172456</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dear NXP,&lt;/P&gt;&lt;P&gt;I'm trying to run a segmentation network on the i.mx8m's npu. The problem is that not matter what I'm trying to do, the model is not running on the npu only, and will fallback on the cpu or is rejected.&lt;/P&gt;&lt;P&gt;I post are some details and my logs so hopefully someone can tell me what I'm doing wrong here.&lt;/P&gt;&lt;P&gt;For testing purposes I created three different sequential models just to demonstrate the errors I'm running into.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Model architectures:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Model 1 contains three convolutional layers.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Input -&amp;gt; Conv1 -&amp;gt; Conv2 -&amp;gt; Conv3 -&amp;gt; Output&lt;/P&gt;&lt;P&gt;Model 2 contains three convolutional layers, a maxpool layer and an upsampling layer&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Input -&amp;gt; Conv1 -&amp;gt; Maxpooling -&amp;gt; Conv2 -&amp;gt; Upsampling2D -&amp;gt; Conv3 -&amp;gt; Output&lt;/P&gt;&lt;P&gt;Model 3 contains three convolutional layers, a maxpool layer and a transpose convolution layer&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Input -&amp;gt; Conv1 -&amp;gt; Maxpooling -&amp;gt; Conv2 -&amp;gt; TransConv -&amp;gt; Conv3 -&amp;gt; Output&lt;/P&gt;&lt;P&gt;These models are trained to have the identity output, again nothing special just to demonstrate the case.&lt;/P&gt;&lt;P&gt;I tried four different solutions to archive my goal (get a segmentation network running on the npu):&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;through a tflite model and a python script using tflite runtime.&lt;/LI&gt;&lt;LI&gt;through a tflite model and a python script using pyarmnn&lt;/LI&gt;&lt;LI&gt;through a onnx model and a c++ script using onnx runtime&lt;/LI&gt;&lt;LI&gt;through a onnx model and&amp;nbsp; a python script using pyarmnn&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;STRONG&gt;Models trained with Tensorflow:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Creation of a tflite model. I converted my models using the tensorflow recepie (&lt;A href="https://www.tensorflow.org/lite/performance/post_training_integer_quant" target="_blank"&gt;https://www.tensorflow.org/lite/performance/post_training_integer_quant&lt;/A&gt;) and the in 'i.MX Machine Learning User's Guide' chapter 3.6 given instructions:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter = tf.lite.TFLiteConverter.from_saved_model(model_path)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.optimizations = [tf.lite.Optimize.DEFAULT]&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.representative_dataset = representative_data_gen&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Set to False to use TOCO&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; #converter.experimental_new_converter = False&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.target_spec.supported_types = [tf.int8]&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.inference_input_type = tf.int8&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.inference_output_type = tf.int8&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; tflite_quant_model = converter.convert()&lt;/P&gt;&lt;P&gt;I used&amp;nbsp; both Tensorflow v2.4 and v2.3 to convert my models.&lt;/P&gt;&lt;P&gt;Here are the logs I got, when running the model on the i.mx8m plus:&lt;/P&gt;&lt;P&gt;Model 1, TF v2.4 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;see log_file.txt Line 1-11&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 1, TF v2.3 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 13- 24&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;No Problems here, but no upsampling or transpose convolution layer.&lt;/P&gt;&lt;P&gt;Model 1 TF v.2.4 using pyarmnn setting VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 26-40&lt;/LI&gt;&lt;LI&gt;input data type QAsymmS8 and output data type QAsymmS8 not supported&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 1 TF v.2.3 using pyarmnn setting VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 47-63&lt;/LI&gt;&lt;LI&gt;input data type QAsymmS8 and output data type QAsymmS8 not supported&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2, TF v2.4 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 65-76 (esp. Line 70)&lt;/LI&gt;&lt;LI&gt;Failed to apply NNAPI delegate&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2, TF v2.3 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 80-92&lt;/LI&gt;&lt;LI&gt;NNAPI does not support half_pixel_centers == true&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2 TF v.2.4 using pyarmnn setting VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 95-111&lt;/LI&gt;&lt;LI&gt;input data type QAsymmS8 and output data type QAsymmS8 not supported&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2 TF v.2.3 using pyarmnn setting VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 114-130&lt;/LI&gt;&lt;LI&gt;input data type QAsymmS8 and output data type QAsymmS8 not supported&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3, TF v2.4 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 133-143 (esp. Line 137)&lt;/LI&gt;&lt;LI&gt;Failed to apply NNAPI delegate.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3, TF v2.3 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 146-157&lt;/LI&gt;&lt;LI&gt;Operator TRANSPOSE_CONV (v3) refused by NNAPI delegate: OP Version different from 1&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3 TF v.2.4 using pyarmnn with VsiNpu as the only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 160-171&lt;/LI&gt;&lt;LI&gt;expected armnn does not support transpose convolution&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3 TF v.2.3 using pyarmnn with VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 174-190&lt;/LI&gt;&lt;LI&gt;expected armnn does not support transpose convolution&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Models trained with Pytorch:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Same model architecture as before. I used Pytorch to train the model, instead of Tensorflow. Models are saved in onnx format using op version 10, 11 and 12. In the next step all models are converted to int8 models using the instructions given in &lt;A href="https://www.onnxruntime.ai/docs/how-to/quantization.html" target="_blank"&gt;https://www.onnxruntime.ai/docs/how-to/quantization.html &lt;/A&gt;and executed on the i.mx8m plus using the instruction given in 'i.MX Machine Learning User's Guide' chapter 6.2&amp;nbsp; with an adapted version of the C_Api_Sample.cpp&lt;/P&gt;&lt;P&gt;Python code for model quantization:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; from onnxruntime.quantization import quantize_dynamic, QuantType&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; model_fp32 = 'model path to fp model'&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; model_quant = 'path to new model.onnx'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QInt8, activation_type=QuantType.QInt8)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Model 1 Pytorch v1.8 using onnx runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;op v10 see log_file.txt Line 193-331&lt;/LI&gt;&lt;LI&gt;op v11 see log_file.txt Line 334-473&lt;/LI&gt;&lt;LI&gt;op v12 see log_file.txt Line 476-615&lt;/LI&gt;&lt;LI&gt;lots of unsupported node messages&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2 Pytorch v1.8 using onnx runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;skipped since Model 1 is not working&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3 Pytorch v1.8 using onnx runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;skipped since Model 1 is not working&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 1 Pytorch v1.8 using pyarmnn:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;op v11 see log_file.txt Line 618-629&lt;/LI&gt;&lt;LI&gt;op v12 see log_file.txt Line 632-643&lt;/LI&gt;&lt;LI&gt;op v13 see log_file.txt Line 646-657&lt;/LI&gt;&lt;LI&gt;only support for float, int32, int64 -&amp;gt; not working on the NPU&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2 Pytorch v1.8 using pyarmnn:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;skipped since Model 1 is not working&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 1 Pytorch v1.8 using pyarmnn:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;skipped since Model 1 is not working&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;The big question is now, what do I have to do in order to get my models working on the NPU or does the i.mx8m NPU not support modern (upsampling) neural networks?&lt;/P&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;&lt;P&gt;Peter Woltersdorf&lt;/P&gt;</description>
    <pubDate>Mon, 12 Apr 2021 14:34:32 GMT</pubDate>
    <dc:creator>woltersd-drfe</dc:creator>
    <dc:date>2021-04-12T14:34:32Z</dc:date>
    <item>
      <title>i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1260632#M172456</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dear NXP,&lt;/P&gt;&lt;P&gt;I'm trying to run a segmentation network on the i.mx8m's npu. The problem is that not matter what I'm trying to do, the model is not running on the npu only, and will fallback on the cpu or is rejected.&lt;/P&gt;&lt;P&gt;I post are some details and my logs so hopefully someone can tell me what I'm doing wrong here.&lt;/P&gt;&lt;P&gt;For testing purposes I created three different sequential models just to demonstrate the errors I'm running into.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Model architectures:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Model 1 contains three convolutional layers.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Input -&amp;gt; Conv1 -&amp;gt; Conv2 -&amp;gt; Conv3 -&amp;gt; Output&lt;/P&gt;&lt;P&gt;Model 2 contains three convolutional layers, a maxpool layer and an upsampling layer&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Input -&amp;gt; Conv1 -&amp;gt; Maxpooling -&amp;gt; Conv2 -&amp;gt; Upsampling2D -&amp;gt; Conv3 -&amp;gt; Output&lt;/P&gt;&lt;P&gt;Model 3 contains three convolutional layers, a maxpool layer and a transpose convolution layer&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Input -&amp;gt; Conv1 -&amp;gt; Maxpooling -&amp;gt; Conv2 -&amp;gt; TransConv -&amp;gt; Conv3 -&amp;gt; Output&lt;/P&gt;&lt;P&gt;These models are trained to have the identity output, again nothing special just to demonstrate the case.&lt;/P&gt;&lt;P&gt;I tried four different solutions to archive my goal (get a segmentation network running on the npu):&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;through a tflite model and a python script using tflite runtime.&lt;/LI&gt;&lt;LI&gt;through a tflite model and a python script using pyarmnn&lt;/LI&gt;&lt;LI&gt;through a onnx model and a c++ script using onnx runtime&lt;/LI&gt;&lt;LI&gt;through a onnx model and&amp;nbsp; a python script using pyarmnn&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;STRONG&gt;Models trained with Tensorflow:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Creation of a tflite model. I converted my models using the tensorflow recepie (&lt;A href="https://www.tensorflow.org/lite/performance/post_training_integer_quant" target="_blank"&gt;https://www.tensorflow.org/lite/performance/post_training_integer_quant&lt;/A&gt;) and the in 'i.MX Machine Learning User's Guide' chapter 3.6 given instructions:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter = tf.lite.TFLiteConverter.from_saved_model(model_path)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.optimizations = [tf.lite.Optimize.DEFAULT]&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.representative_dataset = representative_data_gen&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Set to False to use TOCO&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; #converter.experimental_new_converter = False&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.target_spec.supported_types = [tf.int8]&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.inference_input_type = tf.int8&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; converter.inference_output_type = tf.int8&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; tflite_quant_model = converter.convert()&lt;/P&gt;&lt;P&gt;I used&amp;nbsp; both Tensorflow v2.4 and v2.3 to convert my models.&lt;/P&gt;&lt;P&gt;Here are the logs I got, when running the model on the i.mx8m plus:&lt;/P&gt;&lt;P&gt;Model 1, TF v2.4 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;see log_file.txt Line 1-11&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 1, TF v2.3 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 13- 24&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;No Problems here, but no upsampling or transpose convolution layer.&lt;/P&gt;&lt;P&gt;Model 1 TF v.2.4 using pyarmnn setting VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 26-40&lt;/LI&gt;&lt;LI&gt;input data type QAsymmS8 and output data type QAsymmS8 not supported&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 1 TF v.2.3 using pyarmnn setting VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 47-63&lt;/LI&gt;&lt;LI&gt;input data type QAsymmS8 and output data type QAsymmS8 not supported&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2, TF v2.4 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 65-76 (esp. Line 70)&lt;/LI&gt;&lt;LI&gt;Failed to apply NNAPI delegate&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2, TF v2.3 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 80-92&lt;/LI&gt;&lt;LI&gt;NNAPI does not support half_pixel_centers == true&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2 TF v.2.4 using pyarmnn setting VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 95-111&lt;/LI&gt;&lt;LI&gt;input data type QAsymmS8 and output data type QAsymmS8 not supported&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2 TF v.2.3 using pyarmnn setting VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 114-130&lt;/LI&gt;&lt;LI&gt;input data type QAsymmS8 and output data type QAsymmS8 not supported&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3, TF v2.4 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 133-143 (esp. Line 137)&lt;/LI&gt;&lt;LI&gt;Failed to apply NNAPI delegate.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3, TF v2.3 using TFLite runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 146-157&lt;/LI&gt;&lt;LI&gt;Operator TRANSPOSE_CONV (v3) refused by NNAPI delegate: OP Version different from 1&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3 TF v.2.4 using pyarmnn with VsiNpu as the only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 160-171&lt;/LI&gt;&lt;LI&gt;expected armnn does not support transpose convolution&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3 TF v.2.3 using pyarmnn with VsiNpu as only backend:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;see log_file.txt Line 174-190&lt;/LI&gt;&lt;LI&gt;expected armnn does not support transpose convolution&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Models trained with Pytorch:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Same model architecture as before. I used Pytorch to train the model, instead of Tensorflow. Models are saved in onnx format using op version 10, 11 and 12. In the next step all models are converted to int8 models using the instructions given in &lt;A href="https://www.onnxruntime.ai/docs/how-to/quantization.html" target="_blank"&gt;https://www.onnxruntime.ai/docs/how-to/quantization.html &lt;/A&gt;and executed on the i.mx8m plus using the instruction given in 'i.MX Machine Learning User's Guide' chapter 6.2&amp;nbsp; with an adapted version of the C_Api_Sample.cpp&lt;/P&gt;&lt;P&gt;Python code for model quantization:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; from onnxruntime.quantization import quantize_dynamic, QuantType&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; model_fp32 = 'model path to fp model'&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; model_quant = 'path to new model.onnx'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QInt8, activation_type=QuantType.QInt8)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Model 1 Pytorch v1.8 using onnx runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;op v10 see log_file.txt Line 193-331&lt;/LI&gt;&lt;LI&gt;op v11 see log_file.txt Line 334-473&lt;/LI&gt;&lt;LI&gt;op v12 see log_file.txt Line 476-615&lt;/LI&gt;&lt;LI&gt;lots of unsupported node messages&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2 Pytorch v1.8 using onnx runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;skipped since Model 1 is not working&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 3 Pytorch v1.8 using onnx runtime:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;skipped since Model 1 is not working&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 1 Pytorch v1.8 using pyarmnn:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;op v11 see log_file.txt Line 618-629&lt;/LI&gt;&lt;LI&gt;op v12 see log_file.txt Line 632-643&lt;/LI&gt;&lt;LI&gt;op v13 see log_file.txt Line 646-657&lt;/LI&gt;&lt;LI&gt;only support for float, int32, int64 -&amp;gt; not working on the NPU&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 2 Pytorch v1.8 using pyarmnn:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;skipped since Model 1 is not working&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Model 1 Pytorch v1.8 using pyarmnn:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;skipped since Model 1 is not working&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;The big question is now, what do I have to do in order to get my models working on the NPU or does the i.mx8m NPU not support modern (upsampling) neural networks?&lt;/P&gt;&lt;P&gt;Thanks in advance&lt;/P&gt;&lt;P&gt;Peter Woltersdorf&lt;/P&gt;</description>
      <pubDate>Mon, 12 Apr 2021 14:34:32 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1260632#M172456</guid>
      <dc:creator>woltersd-drfe</dc:creator>
      <dc:date>2021-04-12T14:34:32Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1263720#M172753</link>
      <description>&lt;P&gt;Hello Woltersd,&lt;/P&gt;
&lt;P&gt;Is possible to share there Model?. If it is propriety Model based on there own data set then you can share the Model trained with openly available data set.&lt;/P&gt;
&lt;P&gt;Regards&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Apr 2021 18:53:42 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1263720#M172753</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2021-04-16T18:53:42Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1264322#M172822</link>
      <description>&lt;P&gt;- nothing here -&lt;/P&gt;</description>
      <pubDate>Tue, 20 Apr 2021 08:45:28 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1264322#M172822</guid>
      <dc:creator>woltersd-drfe</dc:creator>
      <dc:date>2021-04-20T08:45:28Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1264976#M172898</link>
      <description>&lt;P&gt;Hi Bio_TICFSL,&lt;/P&gt;&lt;P&gt;no problem I zipped some models into the attached archive, as normal Tensorflow (Keras) models and the quantized versions. The training data is just some random generated data:&lt;/P&gt;&lt;P&gt;import tensorflow as tf&lt;BR /&gt;import numpy as np&lt;/P&gt;&lt;P&gt;def create_dataset(size_dataset=1000, size=(224, 224)):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; array = np.random.randn(size_dataset, size[0], size[1], 3)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mask = np.argmax(array, axis=-1)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return tf.convert_to_tensor(array), tf.convert_to_tensor(mask)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Apr 2021 08:46:20 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1264976#M172898</guid>
      <dc:creator>woltersd-drfe</dc:creator>
      <dc:date>2021-04-20T08:46:20Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1265158#M172930</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;You have captured lot of details in original ticket. Though for my better understanding, can you summarize issue with 3 models you shared with me?&lt;/P&gt;
&lt;P&gt;So I would recommend to share some thing like below.&lt;/P&gt;
&lt;P&gt;Model X: ( 1 to 3)&lt;/P&gt;
&lt;P&gt;Brief Model description ( What TF version used and what quantization used)&lt;/P&gt;
&lt;P&gt;Issue description&lt;/P&gt;
&lt;P&gt;Steps followed to reproduce the issue&lt;/P&gt;
&lt;P&gt;Regards&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 20 Apr 2021 12:58:37 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1265158#M172930</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2021-04-20T12:58:37Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1265226#M172935</link>
      <description>&lt;DIV class="lia-quilt-row lia-quilt-row-message-body"&gt;&lt;DIV class="lia-quilt-column lia-quilt-column-24 lia-quilt-column-single lia-quilt-column-message-body-content"&gt;&lt;DIV class="lia-quilt-column-alley lia-quilt-column-alley-single"&gt;&lt;DIV class="lia-message-body lia-component-message-view-widget-body lia-component-body-signature-highlight-escalation lia-component-message-view-widget-body-signature-highlight-escalation"&gt;&lt;DIV class="lia-message-body-content"&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;for the models supplied in the zip-file I used Tensorflow version v2.3, but I also tried v2.4.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The model conversion is done with the same script every time.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Steps for model conversion:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;converter = tf.lite.TFLiteConverter.from_saved_model(model_path)&lt;BR /&gt;converter.optimizations = [tf.lite.Optimize.DEFAULT]&lt;BR /&gt;converter.representative_dataset = representative_data_gen&lt;BR /&gt;converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]&lt;BR /&gt;# Set to False to use TOCO&lt;BR /&gt;#converter.experimental_new_converter = False&lt;BR /&gt;converter.target_spec.supported_types = [tf.int8]&lt;BR /&gt;converter.inference_input_type = tf.int8&lt;BR /&gt;converter.inference_output_type = tf.int8&lt;BR /&gt;tflite_quant_model = converter.convert()&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Model ThreeConvLayer&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;architecture: Input -&amp;gt; Conv1 -&amp;gt; Conv2 -&amp;gt; Conv3 -&amp;gt; Output&lt;/LI&gt;&lt;LI&gt;only model with no issues in all my testing setups, just for demonstration purposes -&amp;gt; skip to next one&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Model ThreeConvLayerTranspConv&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Description:&lt;UL&gt;&lt;LI&gt;&amp;nbsp;architecture: Input -&amp;gt; Conv1 -&amp;gt; Maxpooling -&amp;gt; Conv2 -&amp;gt; Upsampling2D -&amp;gt; Conv3 -&amp;gt; Output&lt;/LI&gt;&lt;LI&gt;TF v2.3&lt;/LI&gt;&lt;LI&gt;quantization: see above "&lt;STRONG&gt;Steps for model conversion&lt;/STRONG&gt;"&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;Issue:&lt;UL&gt;&lt;LI&gt;Model fallback on the CPU&lt;/LI&gt;&lt;LI&gt;LogOutput: WARNING: Operator TRANSPOSE_CONV (v3) refused by NNAPI delegate: OP Version different from 1&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Steps to Reproduce&lt;UL&gt;&lt;LI&gt;copy the tflite-model (provided in the zip-file) and run it on the i.mx8m plus with the attached script: python3 simple_lite.py &lt;EM&gt;path_to_model&lt;/EM&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Model ThreeConvLayerUpsample&lt;/STRONG&gt;:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Description:&lt;UL&gt;&lt;LI&gt;&amp;nbsp;architecture: Input -&amp;gt; Conv1 -&amp;gt; Maxpooling -&amp;gt; Conv2 -&amp;gt; TransConv -&amp;gt; Conv3 -&amp;gt; Output&lt;/LI&gt;&lt;LI&gt;TF v2.3&lt;/LI&gt;&lt;LI&gt;quantization: see above "&lt;STRONG&gt;Steps for model conversion&lt;/STRONG&gt;"&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;Issue:&lt;UL&gt;&lt;LI&gt;Model fallback on the CPU&lt;/LI&gt;&lt;LI&gt;LogOutput: WARNING: Operator RESIZE_NEAREST_NEIGHBOR (v3) refused by NNAPI delegate: NNAPI does not support half_pixel_centers == true.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Steps to Reproduce&lt;UL&gt;&lt;LI&gt;copy the tflite-model (provided in the zip-file) and run it on the i.mx8m plus with the attached script: python3 simple_lite.py &lt;EM&gt;path_to_model&lt;/EM&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Regards&lt;/EM&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 20 Apr 2021 15:33:56 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1265226#M172935</guid>
      <dc:creator>woltersd-drfe</dc:creator>
      <dc:date>2021-04-20T15:33:56Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1267283#M173111</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I checked the attached the Model. Both the Models are running on NPU, We are profiling the model to see which layer is taking more time and how we can optimize it.&lt;/P&gt;
&lt;P&gt;Regards&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Apr 2021 13:01:06 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1267283#M173111</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2021-04-23T13:01:06Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1269752#M173372</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;any news?&lt;/P&gt;&lt;P&gt;If my model runs on the NPU, without falling back on the CPU, how did you do it?&lt;/P&gt;&lt;P&gt;Do you have an image you can share so I can install on our eval-board to get my models running?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;greets&lt;/P&gt;</description>
      <pubDate>Wed, 28 Apr 2021 14:46:30 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1269752#M173372</guid>
      <dc:creator>woltersd-drfe</dc:creator>
      <dc:date>2021-04-28T14:46:30Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1270848#M173463</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Is this general question or specific to framework? Can you be bit specific on what type of Model are you referring?&lt;/P&gt;
&lt;P&gt;Though we share list of operator on different framework that get executed on NPU.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf" target="_blank" rel="nofollow noopener noreferrer"&gt;https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf.&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Regards&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Apr 2021 13:00:51 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1270848#M173463</guid>
      <dc:creator>Bio_TICFSL</dc:creator>
      <dc:date>2021-04-30T13:00:51Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1271221#M173514</link>
      <description>&lt;P&gt;you wrote:&lt;/P&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I checked the attached the Model. Both the Models are running on NPU, We are profiling the model to see which layer is taking more time and how we can optimize it.&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;---------------------------------------------------------------------------------------------------------------------------------------&lt;BR /&gt;My Answer to &lt;STRONG&gt;your&lt;/STRONG&gt; post:&lt;/P&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;any news? (referred to: We are profiling the model to see which layer is taking more time and how we can optimize it.)&lt;/P&gt;&lt;P&gt;If my model runs on the NPU, without falling back on the CPU, how did you do it? (referred to: I checked the attached the Model. Both the Models are running on NPU)&lt;/P&gt;&lt;P&gt;Do you have an image you can share so I can install on our eval-board to get my models running?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I created our image exactly to the recipe in &lt;A href="https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf" target="_blank" rel="noopener"&gt;https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf&lt;/A&gt; using zeus imx-5.4.70-2.3.2.xml.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Some real help would be nice.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Regards&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 May 2021 10:12:25 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1271221#M173514</guid>
      <dc:creator>woltersd-drfe</dc:creator>
      <dc:date>2021-05-03T10:12:25Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1324170#M178525</link>
      <description>&lt;P&gt;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/185055"&gt;@woltersd-drfe&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;woltersd-drfe&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Did u solve your imx8mp npu segmetation fail issue? I encountered this also.&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Thanks.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2021 01:35:06 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1324170#M178525</guid>
      <dc:creator>fjpmbb_abc</dc:creator>
      <dc:date>2021-08-17T01:35:06Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1324564#M178562</link>
      <description>&lt;P&gt;Hi fjpmbb_abc,&lt;/P&gt;&lt;P&gt;unfortunately not really. The information I got is that the npu does not support transpose convolution yet and image resizing is only supported with Tensorflow v2.1 (tested and verified).&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2021 08:33:16 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1324564#M178562</guid>
      <dc:creator>woltersd-drfe</dc:creator>
      <dc:date>2021-08-17T08:33:16Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1324586#M178565</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&lt;a href="https://community.nxp.com/t5/user/viewprofilepage/user-id/185055"&gt;@woltersd-drfe&lt;/a&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Hi&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;woltersd-drfe&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;According to your reply "&amp;nbsp;image resizing is only supported with Tensorflow v2.1 (tested and verified)". Do you mean if i use tensorflow v2.1 in imx8m plus, the NPU acceleration will work well?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2021 08:47:50 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1324586#M178565</guid>
      <dc:creator>fjpmbb_abc</dc:creator>
      <dc:date>2021-08-17T08:47:50Z</dc:date>
    </item>
    <item>
      <title>Re: i.mx8m plus npu failure log</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1324599#M178567</link>
      <description>&lt;P&gt;Hi fjpmbb_abc,&lt;/P&gt;&lt;P&gt;you have to use Tensorflow v2.1 for the quantization of your model. The quantized model will run on the npu.&lt;/P&gt;&lt;P&gt;Note: TF v2.1 does not support int8 input and output but only float.&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;</description>
      <pubDate>Tue, 17 Aug 2021 09:03:33 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/i-mx8m-plus-npu-failure-log/m-p/1324599#M178567</guid>
      <dc:creator>woltersd-drfe</dc:creator>
      <dc:date>2021-08-17T09:03:33Z</dc:date>
    </item>
  </channel>
</rss>

