Hello there,
I have a problem execute quantized models on the NPU with ONNXRuntime. I downloaded the models mobilenet_v2_1.0_224.tflite, mobilenet_v2_1.0_224_quant.tflite, inception_v3.tflite and inception_v3_quant.tflite from the Machine Learning User's Guide and converted the models with the eIQ model converter.
For all the models I get correct results running it with the CPU_ACL EP. When I run the not quantized models with the Vsi_Npu EP, I get correct results too. But when I run the quantized models with the Vsi_Npu EP, I get wrong results.
I tried the following thing too: Convert the mobilenet_v2.tflite model to a mobilenet_v2.onnx model and quantize it then with float as input and output data type. Then I get wrong results even if I run the model on the CPU.
Is there a problem with the Vsi_Npu EP for running quantized models? Or is there a problem with my converted models (I attach them here)?
Thanks for your help !
If anyone needs more information, please ask.
Kind regards, Chris