topic i.mx8m plus npu failure log in i.MX Processors

i.mx8m plus npu failure log

woltersd-drfe — Mon, 12 Apr 2021 14:34:32 GMT

Dear NXP,

I'm trying to run a segmentation network on the i.mx8m's npu. The problem is that not matter what I'm trying to do, the model is not running on the npu only, and will fallback on the cpu or is rejected.

I post are some details and my logs so hopefully someone can tell me what I'm doing wrong here.

For testing purposes I created three different sequential models just to demonstrate the errors I'm running into.

Model architectures:

Model 1 contains three convolutional layers.

Input -> Conv1 -> Conv2 -> Conv3 -> Output

Model 2 contains three convolutional layers, a maxpool layer and an upsampling layer

Input -> Conv1 -> Maxpooling -> Conv2 -> Upsampling2D -> Conv3 -> Output

Model 3 contains three convolutional layers, a maxpool layer and a transpose convolution layer

Input -> Conv1 -> Maxpooling -> Conv2 -> TransConv -> Conv3 -> Output

These models are trained to have the identity output, again nothing special just to demonstrate the case.

I tried four different solutions to archive my goal (get a segmentation network running on the npu):

through a tflite model and a python script using tflite runtime.
through a tflite model and a python script using pyarmnn
through a onnx model and a c++ script using onnx runtime
through a onnx model and a python script using pyarmnn

Models trained with Tensorflow:

Creation of a tflite model. I converted my models using the tensorflow recepie (https://www.tensorflow.org/lite/performance/post_training_integer_quant) and the in 'i.MX Machine Learning User's Guide' chapter 3.6 given instructions:

    converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.representative_dataset = representative_data_gen
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    # Set to False to use TOCO
    #converter.experimental_new_converter = False
    converter.target_spec.supported_types = [tf.int8]
    converter.inference_input_type = tf.int8
    converter.inference_output_type = tf.int8
    tflite_quant_model = converter.convert()

I used both Tensorflow v2.4 and v2.3 to convert my models.

Here are the logs I got, when running the model on the i.mx8m plus:

Model 1, TF v2.4 using TFLite runtime:

see log_file.txt Line 1-11

Model 1, TF v2.3 using TFLite runtime:

see log_file.txt Line 13- 24

No Problems here, but no upsampling or transpose convolution layer.

Model 1 TF v.2.4 using pyarmnn setting VsiNpu as only backend:

see log_file.txt Line 26-40
input data type QAsymmS8 and output data type QAsymmS8 not supported

Model 1 TF v.2.3 using pyarmnn setting VsiNpu as only backend:

see log_file.txt Line 47-63
input data type QAsymmS8 and output data type QAsymmS8 not supported

Model 2, TF v2.4 using TFLite runtime:

see log_file.txt Line 65-76 (esp. Line 70)
Failed to apply NNAPI delegate

Model 2, TF v2.3 using TFLite runtime:

see log_file.txt Line 80-92
NNAPI does not support half_pixel_centers == true

Model 2 TF v.2.4 using pyarmnn setting VsiNpu as only backend:

see log_file.txt Line 95-111
input data type QAsymmS8 and output data type QAsymmS8 not supported

Model 2 TF v.2.3 using pyarmnn setting VsiNpu as only backend:

see log_file.txt Line 114-130
input data type QAsymmS8 and output data type QAsymmS8 not supported

Model 3, TF v2.4 using TFLite runtime:

see log_file.txt Line 133-143 (esp. Line 137)
Failed to apply NNAPI delegate.

Model 3, TF v2.3 using TFLite runtime:

see log_file.txt Line 146-157
Operator TRANSPOSE_CONV (v3) refused by NNAPI delegate: OP Version different from 1

Model 3 TF v.2.4 using pyarmnn with VsiNpu as the only backend:

see log_file.txt Line 160-171
expected armnn does not support transpose convolution

Model 3 TF v.2.3 using pyarmnn with VsiNpu as only backend:

see log_file.txt Line 174-190
expected armnn does not support transpose convolution

Models trained with Pytorch:

Same model architecture as before. I used Pytorch to train the model, instead of Tensorflow. Models are saved in onnx format using op version 10, 11 and 12. In the next step all models are converted to int8 models using the instructions given in https://www.onnxruntime.ai/docs/how-to/quantization.html and executed on the i.mx8m plus using the instruction given in 'i.MX Machine Learning User's Guide' chapter 6.2 with an adapted version of the C_Api_Sample.cpp

Python code for model quantization:

from onnxruntime.quantization import quantize_dynamic, QuantType

model_fp32 = 'model path to fp model'
model_quant = 'path to new model.onnx'

quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QInt8, activation_type=QuantType.QInt8)

Model 1 Pytorch v1.8 using onnx runtime:

op v10 see log_file.txt Line 193-331
op v11 see log_file.txt Line 334-473
op v12 see log_file.txt Line 476-615
lots of unsupported node messages

Model 2 Pytorch v1.8 using onnx runtime:

skipped since Model 1 is not working

Model 3 Pytorch v1.8 using onnx runtime:

skipped since Model 1 is not working

Model 1 Pytorch v1.8 using pyarmnn:

op v11 see log_file.txt Line 618-629
op v12 see log_file.txt Line 632-643
op v13 see log_file.txt Line 646-657
only support for float, int32, int64 -> not working on the NPU

Model 2 Pytorch v1.8 using pyarmnn:

skipped since Model 1 is not working

Model 1 Pytorch v1.8 using pyarmnn:

skipped since Model 1 is not working

The big question is now, what do I have to do in order to get my models working on the NPU or does the i.mx8m NPU not support modern (upsampling) neural networks?

Thanks in advance

Peter Woltersdorf

Re: i.mx8m plus npu failure log

Bio_TICFSL — Fri, 16 Apr 2021 18:53:42 GMT

Hello Woltersd,

Is possible to share there Model?. If it is propriety Model based on there own data set then you can share the Model trained with openly available data set.

Regards

Re: i.mx8m plus npu failure log

woltersd-drfe — Tue, 20 Apr 2021 08:45:28 GMT

- nothing here -

Re: i.mx8m plus npu failure log

woltersd-drfe — Tue, 20 Apr 2021 08:46:20 GMT

Hi Bio_TICFSL,

no problem I zipped some models into the attached archive, as normal Tensorflow (Keras) models and the quantized versions. The training data is just some random generated data:

import tensorflow as tf
import numpy as np

def create_dataset(size_dataset=1000, size=(224, 224)):

array = np.random.randn(size_dataset, size[0], size[1], 3)
mask = np.argmax(array, axis=-1)

return tf.convert_to_tensor(array), tf.convert_to_tensor(mask)

Regards

Re: i.mx8m plus npu failure log

Bio_TICFSL — Tue, 20 Apr 2021 12:58:37 GMT

Hi,

You have captured lot of details in original ticket. Though for my better understanding, can you summarize issue with 3 models you shared with me?

So I would recommend to share some thing like below.

Model X: ( 1 to 3)

Brief Model description ( What TF version used and what quantization used)

Issue description

Steps followed to reproduce the issue

Regards

Re: i.mx8m plus npu failure log

woltersd-drfe — Tue, 20 Apr 2021 15:33:56 GMT

Hi,

for the models supplied in the zip-file I used Tensorflow version v2.3, but I also tried v2.4.

The model conversion is done with the same script every time.

Steps for model conversion:

converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set to False to use TOCO
#converter.experimental_new_converter = False
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_quant_model = converter.convert()

Model ThreeConvLayer:

architecture: Input -> Conv1 -> Conv2 -> Conv3 -> Output
only model with no issues in all my testing setups, just for demonstration purposes -> skip to next one

Model ThreeConvLayerTranspConv:

Description:
- architecture: Input -> Conv1 -> Maxpooling -> Conv2 -> Upsampling2D -> Conv3 -> Output
- TF v2.3
- quantization: see above "Steps for model conversion"

Issue:
- Model fallback on the CPU
- LogOutput: WARNING: Operator TRANSPOSE_CONV (v3) refused by NNAPI delegate: OP Version different from 1
Steps to Reproduce
- copy the tflite-model (provided in the zip-file) and run it on the i.mx8m plus with the attached script: python3 simple_lite.py path_to_model

Model ThreeConvLayerUpsample:

Description:
- architecture: Input -> Conv1 -> Maxpooling -> Conv2 -> TransConv -> Conv3 -> Output
- TF v2.3
- quantization: see above "Steps for model conversion"

Issue:
- Model fallback on the CPU
- LogOutput: WARNING: Operator RESIZE_NEAREST_NEIGHBOR (v3) refused by NNAPI delegate: NNAPI does not support half_pixel_centers == true.
Steps to Reproduce
- copy the tflite-model (provided in the zip-file) and run it on the i.mx8m plus with the attached script: python3 simple_lite.py path_to_model

Regards

Re: i.mx8m plus npu failure log

Bio_TICFSL — Fri, 23 Apr 2021 13:01:06 GMT

Hi,

I checked the attached the Model. Both the Models are running on NPU, We are profiling the model to see which layer is taking more time and how we can optimize it.

Regards

Re: i.mx8m plus npu failure log

woltersd-drfe — Wed, 28 Apr 2021 14:46:30 GMT

Hi,

any news?

If my model runs on the NPU, without falling back on the CPU, how did you do it?

Do you have an image you can share so I can install on our eval-board to get my models running?

greets

Re: i.mx8m plus npu failure log

Bio_TICFSL — Fri, 30 Apr 2021 13:00:51 GMT

Hello,

Is this general question or specific to framework? Can you be bit specific on what type of Model are you referring?

Though we share list of operator on different framework that get executed on NPU.

https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf.

Regards

Re: i.mx8m plus npu failure log

woltersd-drfe — Mon, 03 May 2021 10:12:25 GMT

you wrote:

Hi,

I checked the attached the Model. Both the Models are running on NPU, We are profiling the model to see which layer is taking more time and how we can optimize it.

Regards

---------------------------------------------------------------------------------------------------------------------------------------
My Answer to your post:

Hi,

any news? (referred to: We are profiling the model to see which layer is taking more time and how we can optimize it.)

If my model runs on the NPU, without falling back on the CPU, how did you do it? (referred to: I checked the attached the Model. Both the Models are running on NPU)

Do you have an image you can share so I can install on our eval-board to get my models running?

I created our image exactly to the recipe in https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf using zeus imx-5.4.70-2.3.2.xml.

Some real help would be nice.

Regards

Re: i.mx8m plus npu failure log

fjpmbb_abc — Tue, 17 Aug 2021 01:35:06 GMT

@woltersd-drfe

Hi woltersd-drfe

Did u solve your imx8mp npu segmetation fail issue? I encountered this also.

Thanks.

Re: i.mx8m plus npu failure log

woltersd-drfe — Tue, 17 Aug 2021 08:33:16 GMT

Hi fjpmbb_abc,

unfortunately not really. The information I got is that the npu does not support transpose convolution yet and image resizing is only supported with Tensorflow v2.1 (tested and verified).

Regards

Re: i.mx8m plus npu failure log

fjpmbb_abc — Tue, 17 Aug 2021 08:47:50 GMT

@woltersd-drfe

Hi woltersd-drfe

According to your reply " image resizing is only supported with Tensorflow v2.1 (tested and verified)". Do you mean if i use tensorflow v2.1 in imx8m plus, the NPU acceleration will work well?

Regards.

Re: i.mx8m plus npu failure log

woltersd-drfe — Tue, 17 Aug 2021 09:03:33 GMT

Hi fjpmbb_abc,

you have to use Tensorflow v2.1 for the quantization of your model. The quantized model will run on the npu.

Note: TF v2.1 does not support int8 input and output but only float.

Regards