Dear NXP,
I'm trying to run a segmentation network on the i.mx8m's npu. The problem is that not matter what I'm trying to do, the model is not running on the npu only, and will fallback on the cpu or is rejected.
I post are some details and my logs so hopefully someone can tell me what I'm doing wrong here.
For testing purposes I created three different sequential models just to demonstrate the errors I'm running into.
Model architectures:
Model 1 contains three convolutional layers.
Input -> Conv1 -> Conv2 -> Conv3 -> Output
Model 2 contains three convolutional layers, a maxpool layer and an upsampling layer
Input -> Conv1 -> Maxpooling -> Conv2 -> Upsampling2D -> Conv3 -> Output
Model 3 contains three convolutional layers, a maxpool layer and a transpose convolution layer
Input -> Conv1 -> Maxpooling -> Conv2 -> TransConv -> Conv3 -> Output
These models are trained to have the identity output, again nothing special just to demonstrate the case.
I tried four different solutions to archive my goal (get a segmentation network running on the npu):
Models trained with Tensorflow:
Creation of a tflite model. I converted my models using the tensorflow recepie (https://www.tensorflow.org/lite/performance/post_training_integer_quant) and the in 'i.MX Machine Learning User's Guide' chapter 3.6 given instructions:
converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set to False to use TOCO
#converter.experimental_new_converter = False
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_quant_model = converter.convert()
I used both Tensorflow v2.4 and v2.3 to convert my models.
Here are the logs I got, when running the model on the i.mx8m plus:
Model 1, TF v2.4 using TFLite runtime:
Model 1, TF v2.3 using TFLite runtime:
No Problems here, but no upsampling or transpose convolution layer.
Model 1 TF v.2.4 using pyarmnn setting VsiNpu as only backend:
Model 1 TF v.2.3 using pyarmnn setting VsiNpu as only backend:
Model 2, TF v2.4 using TFLite runtime:
Model 2, TF v2.3 using TFLite runtime:
Model 2 TF v.2.4 using pyarmnn setting VsiNpu as only backend:
Model 2 TF v.2.3 using pyarmnn setting VsiNpu as only backend:
Model 3, TF v2.4 using TFLite runtime:
Model 3, TF v2.3 using TFLite runtime:
Model 3 TF v.2.4 using pyarmnn with VsiNpu as the only backend:
Model 3 TF v.2.3 using pyarmnn with VsiNpu as only backend:
Models trained with Pytorch:
Same model architecture as before. I used Pytorch to train the model, instead of Tensorflow. Models are saved in onnx format using op version 10, 11 and 12. In the next step all models are converted to int8 models using the instructions given in https://www.onnxruntime.ai/docs/how-to/quantization.html and executed on the i.mx8m plus using the instruction given in 'i.MX Machine Learning User's Guide' chapter 6.2 with an adapted version of the C_Api_Sample.cpp
Python code for model quantization:
from onnxruntime.quantization import quantize_dynamic, QuantType
model_fp32 = 'model path to fp model'
model_quant = 'path to new model.onnx'
quantized_model = quantize_dynamic(model_fp32, model_quant, weight_type=QuantType.QInt8, activation_type=QuantType.QInt8)
Model 1 Pytorch v1.8 using onnx runtime:
Model 2 Pytorch v1.8 using onnx runtime:
Model 3 Pytorch v1.8 using onnx runtime:
Model 1 Pytorch v1.8 using pyarmnn:
Model 2 Pytorch v1.8 using pyarmnn:
Model 1 Pytorch v1.8 using pyarmnn:
The big question is now, what do I have to do in order to get my models working on the NPU or does the i.mx8m NPU not support modern (upsampling) neural networks?
Thanks in advance
Peter Woltersdorf
Hello Woltersd,
Is possible to share there Model?. If it is propriety Model based on there own data set then you can share the Model trained with openly available data set.
Regards
Hi Bio_TICFSL,
no problem I zipped some models into the attached archive, as normal Tensorflow (Keras) models and the quantized versions. The training data is just some random generated data:
import tensorflow as tf
import numpy as np
def create_dataset(size_dataset=1000, size=(224, 224)):
array = np.random.randn(size_dataset, size[0], size[1], 3)
mask = np.argmax(array, axis=-1)
return tf.convert_to_tensor(array), tf.convert_to_tensor(mask)
Regards
Hi,
You have captured lot of details in original ticket. Though for my better understanding, can you summarize issue with 3 models you shared with me?
So I would recommend to share some thing like below.
Model X: ( 1 to 3)
Brief Model description ( What TF version used and what quantization used)
Issue description
Steps followed to reproduce the issue
Regards
Hi,
for the models supplied in the zip-file I used Tensorflow version v2.3, but I also tried v2.4.
The model conversion is done with the same script every time.
Steps for model conversion:
converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set to False to use TOCO
#converter.experimental_new_converter = False
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_quant_model = converter.convert()
Model ThreeConvLayer:
Model ThreeConvLayerTranspConv:
Model ThreeConvLayerUpsample:
Regards
Hi,
I checked the attached the Model. Both the Models are running on NPU, We are profiling the model to see which layer is taking more time and how we can optimize it.
Regards
Hi,
any news?
If my model runs on the NPU, without falling back on the CPU, how did you do it?
Do you have an image you can share so I can install on our eval-board to get my models running?
greets
Hello,
Is this general question or specific to framework? Can you be bit specific on what type of Model are you referring?
Though we share list of operator on different framework that get executed on NPU.
https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf.
Regards
you wrote:
Hi,
I checked the attached the Model. Both the Models are running on NPU, We are profiling the model to see which layer is taking more time and how we can optimize it.
Regards
---------------------------------------------------------------------------------------------------------------------------------------
My Answer to your post:
Hi,
any news? (referred to: We are profiling the model to see which layer is taking more time and how we can optimize it.)
If my model runs on the NPU, without falling back on the CPU, how did you do it? (referred to: I checked the attached the Model. Both the Models are running on NPU)
Do you have an image you can share so I can install on our eval-board to get my models running?
I created our image exactly to the recipe in https://www.nxp.com/docs/en/user-guide/IMX_YOCTO_PROJECT_USERS_GUIDE.pdf using zeus imx-5.4.70-2.3.2.xml.
Some real help would be nice.
Regards
Hi woltersd-drfe
Did u solve your imx8mp npu segmetation fail issue? I encountered this also.
Thanks.
Hi fjpmbb_abc,
unfortunately not really. The information I got is that the npu does not support transpose convolution yet and image resizing is only supported with Tensorflow v2.1 (tested and verified).
Regards
Hi woltersd-drfe
According to your reply " image resizing is only supported with Tensorflow v2.1 (tested and verified)". Do you mean if i use tensorflow v2.1 in imx8m plus, the NPU acceleration will work well?
Regards.
Hi fjpmbb_abc,
you have to use Tensorflow v2.1 for the quantization of your model. The quantized model will run on the npu.
Note: TF v2.1 does not support int8 input and output but only float.
Regards