i.mx8m plus npu failure log

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

i.mx8m plus npu failure log

5,053件の閲覧回数
woltersd-drfe
Contributor II

 

Dear NXP,

I'm trying to run a segmentation network on the i.mx8m's npu. The problem is that not matter what I'm trying to do, the model is not running on the npu only, and will fallback on the cpu or is rejected.

I post are some details and my logs so hopefully someone can tell me what I'm doing wrong here.

For testing purposes I created three different sequential models just to demonstrate the errors I'm running into.

Model architectures:

Model 1 contains three convolutional layers.

     Input -> Conv1 -> Conv2 -> Conv3 -> Output

Model 2 contains three convolutional layers, a maxpool layer and an upsampling layer

    Input -> Conv1 -> Maxpooling -> Conv2 -> Upsampling2D -> Conv3 -> Output

Model 3 contains three convolutional layers, a maxpool layer and a transpose convolution layer

    Input -> Conv1 -> Maxpooling -> Conv2 -> TransConv -> Conv3 -> Output

These models are trained to have the identity output, again nothing special just to demonstrate the case.

I tried four different solutions to archive my goal (get a segmentation network running on the npu):

  1. through a tflite model and a python script using tflite runtime.
  2. through a tflite model and a python script using pyarmnn
  3. through a onnx model and a c++ script using onnx runtime
  4. through a onnx model and  a python script using pyarmnn

Models trained with Tensorflow:

Creation of a tflite model. I converted my models using the tensorflow recepie (https://www.tensorflow.org/lite/performance/post_training_integer_quant) and the in 'i.MX Machine Learning User's Guide' chapter 3.6 given instructions:

    converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.representative_dataset = representative_data_gen
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    # Set to False to use TOCO
    #converter.experimental_new_converter = False
    converter.target_spec.supported_types = [tf.int8]
    converter.inference_input_type = tf.int8
    converter.inference_output_type = tf.int8
    tflite_quant_model = converter.convert()

I used  both Tensorflow v2.4 and v2.3 to convert my models.

Here are the logs I got, when running the model on the i.mx8m plus:

Model 1, TF v2.4 using TFLite runtime:

  •  see log_file.txt Line 1-11

Model 1, TF v2.3 using TFLite runtime:

  • see log_file.txt Line 13- 24

No Problems here, but no upsampling or transpose convolution layer.

Model 1 TF v.2.4 using pyarmnn setting VsiNpu as only backend:

  • see log_file.txt Line 26-40
  • input data type QAsymmS8 and output data type QAsymmS8 not supported

Model 1 TF v.2.3 using pyarmnn setting VsiNpu as only backend:

  • see log_file.txt Line 47-63
  • input data type QAsymmS8 and output data type QAsymmS8 not supported

Model 2, TF v2.4 using TFLite runtime:

  • see log_file.txt Line 65-76 (esp. Line 70)
  • Failed to apply NNAPI delegate

Model 2, TF v2.3 using TFLite runtime:

  • see log_file.txt Line 80-92
  • NNAPI does not support half_pixel_centers == true

Model 2 TF v.2.4 using pyarmnn setting VsiNpu as only backend:

  • see log_file.txt Line 95-111
  • input data type QAsymmS8 and output data type QAsymmS8 not supported

Model 2 TF v.2.3 using pyarmnn setting VsiNpu as only backend:

  • see log_file.txt Line 114-130
  • input data type QAsymmS8 and output data type QAsymmS8 not supported

Model 3, TF v2.4 using TFLite runtime:

  • see log_file.txt Line 133-143 (esp. Line 137)
  • Failed to apply NNAPI delegate.

Model 3, TF v2.3 using TFLite runtime:

  • see log_file.txt Line 146-157
  • Operator TRANSPOSE_CONV (v3) refused by NNAPI delegate: OP Version different from 1

Model 3 TF v.2.4 using pyarmnn with VsiNpu as the only backend:

  • see log_file.txt Line 160-171
  • expected armnn does not support transpose convolution

Model 3 TF v.2.3 using pyarmnn with VsiNpu as only backend:

  • see log_file.txt Line 174-190
  • expected armnn does not support transpose convolution

Models trained with Pytorch:

Same model architecture as before. I used Pytorch to train the model, instead of Tensorflow. Models are saved in onnx format using op version 10, 11 and 12. In the next step all models are converted to int8 models using the instructions given in https://www.onnxruntime.ai/docs/how-to/quantization.html and executed on the i.mx8m plus using the instruction given in 'i.MX Machine Learning User's Guide' chapter 6.2  with an adapted version of the C_Api_Sample.cpp

Python code for model quantization:

    from onnxruntime.quantization import quantize_dynamic, QuantType

    model_fp32 = 'model path to fp model'
    model_quant = 'path to new model.onnx'

    quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QInt8, activation_type=QuantType.QInt8)

 

Model 1 Pytorch v1.8 using onnx runtime:

  • op v10 see log_file.txt Line 193-331
  • op v11 see log_file.txt Line 334-473
  • op v12 see log_file.txt Line 476-615
  • lots of unsupported node messages

Model 2 Pytorch v1.8 using onnx runtime:

  • skipped since Model 1 is not working

Model 3 Pytorch v1.8 using onnx runtime:

  • skipped since Model 1 is not working

Model 1 Pytorch v1.8 using pyarmnn:

  • op v11 see log_file.txt Line 618-629
  • op v12 see log_file.txt Line 632-643
  • op v13 see log_file.txt Line 646-657
  • only support for float, int32, int64 -> not working on the NPU

Model 2 Pytorch v1.8 using pyarmnn:

  • skipped since Model 1 is not working

Model 1 Pytorch v1.8 using pyarmnn:

  • skipped since Model 1 is not working

The big question is now, what do I have to do in order to get my models working on the NPU or does the i.mx8m NPU not support modern (upsampling) neural networks?

Thanks in advance

Peter Woltersdorf

ラベル(1)
タグ(2)
0 件の賞賛
返信
13 返答(返信)

5,005件の閲覧回数
woltersd-drfe
Contributor II

- nothing here -

0 件の賞賛
返信

5,014件の閲覧回数
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello Woltersd,

Is possible to share there Model?. If it is propriety Model based on there own data set then you can share the Model trained with openly available data set.

Regards

 

0 件の賞賛
返信

4,995件の閲覧回数
woltersd-drfe
Contributor II

Hi Bio_TICFSL,

no problem I zipped some models into the attached archive, as normal Tensorflow (Keras) models and the quantized versions. The training data is just some random generated data:

import tensorflow as tf
import numpy as np

def create_dataset(size_dataset=1000, size=(224, 224)):

    array = np.random.randn(size_dataset, size[0], size[1], 3)
    mask = np.argmax(array, axis=-1)

    return tf.convert_to_tensor(array), tf.convert_to_tensor(mask)

 

Regards

 

 

 

0 件の賞賛
返信

4,989件の閲覧回数
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hi,

You have captured lot of details in original ticket. Though for my better understanding, can you summarize issue with 3 models you shared with me?

So I would recommend to share some thing like below.

Model X: ( 1 to 3)

Brief Model description ( What TF version used and what quantization used)

Issue description

Steps followed to reproduce the issue

Regards

 

0 件の賞賛
返信

4,981件の閲覧回数
woltersd-drfe
Contributor II

Hi,

for the models supplied in the zip-file I used Tensorflow version v2.3, but I also tried v2.4. 

The model conversion is done with the same script every time.

Steps for model conversion:

converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set to False to use TOCO
#converter.experimental_new_converter = False
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_quant_model = converter.convert()

Model ThreeConvLayer:

  • architecture: Input -> Conv1 -> Conv2 -> Conv3 -> Output
  • only model with no issues in all my testing setups, just for demonstration purposes -> skip to next one

Model ThreeConvLayerTranspConv:

  • Description:
    •  architecture: Input -> Conv1 -> Maxpooling -> Conv2 -> Upsampling2D -> Conv3 -> Output
    • TF v2.3
    • quantization: see above "Steps for model conversion"
  • Issue:
    • Model fallback on the CPU
    • LogOutput: WARNING: Operator TRANSPOSE_CONV (v3) refused by NNAPI delegate: OP Version different from 1
  • Steps to Reproduce
    • copy the tflite-model (provided in the zip-file) and run it on the i.mx8m plus with the attached script: python3 simple_lite.py path_to_model

Model ThreeConvLayerUpsample:

  • Description:
    •  architecture: Input -> Conv1 -> Maxpooling -> Conv2 -> TransConv -> Conv3 -> Output
    • TF v2.3
    • quantization: see above "Steps for model conversion"
  • Issue:
    • Model fallback on the CPU
    • LogOutput: WARNING: Operator RESIZE_NEAREST_NEIGHBOR (v3) refused by NNAPI delegate: NNAPI does not support half_pixel_centers == true.
  • Steps to Reproduce
    • copy the tflite-model (provided in the zip-file) and run it on the i.mx8m plus with the attached script: python3 simple_lite.py path_to_model

 

Regards

0 件の賞賛
返信

4,950件の閲覧回数
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hi,

I checked the attached the Model. Both the Models are running on NPU, We are profiling the model to see which layer is taking more time and how we can optimize it.

Regards

 

0 件の賞賛
返信

4,920件の閲覧回数
woltersd-drfe
Contributor II

Hi,

any news?

If my model runs on the NPU, without falling back on the CPU, how did you do it?

Do you have an image you can share so I can install on our eval-board to get my models running?

 

greets

0 件の賞賛
返信

4,900件の閲覧回数
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello,

Is this general question or specific to framework? Can you be bit specific on what type of Model are you referring?

Though we share list of operator on different framework that get executed on NPU.

https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf.

Regards

 

0 件の賞賛
返信

4,876件の閲覧回数
woltersd-drfe
Contributor II

you wrote:

Hi,

I checked the attached the Model. Both the Models are running on NPU, We are profiling the model to see which layer is taking more time and how we can optimize it.

Regards

---------------------------------------------------------------------------------------------------------------------------------------
My Answer to your post:

Hi,

any news? (referred to: We are profiling the model to see which layer is taking more time and how we can optimize it.)

If my model runs on the NPU, without falling back on the CPU, how did you do it? (referred to: I checked the attached the Model. Both the Models are running on NPU)

Do you have an image you can share so I can install on our eval-board to get my models running?

 

I created our image exactly to the recipe in https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf using zeus imx-5.4.70-2.3.2.xml.


Some real help would be nice.


Regards

 

 

0 件の賞賛
返信

4,551件の閲覧回数
fjpmbb_abc
Contributor I

@woltersd-drfe 

Hi woltersd-drfe

     Did u solve your imx8mp npu segmetation fail issue? I encountered this also.

     Thanks. 

0 件の賞賛
返信

4,543件の閲覧回数
woltersd-drfe
Contributor II

Hi fjpmbb_abc,

unfortunately not really. The information I got is that the npu does not support transpose convolution yet and image resizing is only supported with Tensorflow v2.1 (tested and verified).

Regards

 

0 件の賞賛
返信

4,538件の閲覧回数
fjpmbb_abc
Contributor I

@woltersd-drfe 

Hi woltersd-drfe

     According to your reply " image resizing is only supported with Tensorflow v2.1 (tested and verified)". Do you mean if i use tensorflow v2.1 in imx8m plus, the NPU acceleration will work well?

 

Regards.

0 件の賞賛
返信

4,534件の閲覧回数
woltersd-drfe
Contributor II

Hi fjpmbb_abc,

you have to use Tensorflow v2.1 for the quantization of your model. The quantized model will run on the npu.

Note: TF v2.1 does not support int8 input and output but only float.

Regards

0 件の賞賛
返信