Problems execute quant models on NPU with ONNXRuntime

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Problems execute quant models on NPU with ONNXRuntime

1,746 Views
christhi
Contributor II

Hello there,

I have a problem execute quantized models on the NPU with ONNXRuntime. I downloaded the models mobilenet_v2_1.0_224.tflite, mobilenet_v2_1.0_224_quant.tflite, inception_v3.tflite and inception_v3_quant.tflite from the Machine Learning User's Guide and converted the models with the eIQ model converter.

For all the models I get correct results running it with the CPU_ACL EP. When I run the not quantized models with the Vsi_Npu EP, I get correct results too. But when I run the quantized models with the Vsi_Npu EP, I get wrong results.

I tried the following thing too: Convert the mobilenet_v2.tflite model to a mobilenet_v2.onnx model and quantize it then with float as input and output data type. Then I get wrong results even if I run the model on the CPU.

Is there a problem with the Vsi_Npu EP for running quantized models? Or is there a problem with my converted models (I attach them here)?

Thanks for your help !

If anyone needs more information, please ask.

Kind regards, Chris

0 Kudos
Reply
2 Replies

1,724 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

The tensor of NPU can't support float input/output.It can support 8/16-bit integer Tensor data format and support 8, 16, 32-bit integer operations pipeline.

0 Kudos
Reply

1,720 Views
christhi
Contributor II

Hey @Zhiming_Liu , thanks for your help.

Maybe this is correct, but when I run not quantized models with float as input/output on the NPU (using VsiNpu or NNAPI from OnnxRuntime) I get correct results. The NPU don't support the layers but will fall back on the CPU to calculate the results. And the models I attached above are quantized with uint8_t input/output and should be supported by the NPU.Why do I get wrong results for them? I mean, when a layer in one of the models maybe is don't supported, it should fall back to the CPU and brings correct results too.
For ArmNN and TFLite I don't have problems like that. For both everything works fine with quantized models with uint8_t input and not quantized models with float input. So is there maybe a problem by OnnxRuntime? Or there is a problem in the convertion step from tflite to onnx format? Could you please have a look on the models? Maybe there is something wrong.

Thanks for your help.

Kind regards, Chris

0 Kudos
Reply