We have a fully quantized (uint8) model to be run on iMX8MPlus.
When run on CPU, the inference gives back exactly the same neural activations that we algebraically expect (exactly the same of training phase).
Instead, on the NPU, the inference (via NNAPI Delegate) gives different results with different activations and in some rare cases, gives completely incorrect activations.
This is due (probably) to the accumulation of multiple internal approximations for some kind of operation(s). We obviously want that inference output on the NPU is exactly the same on the CPU and of training phase (on server). Any advice?
Are there technical info about NPU and haw it handles int8, uint8 and the relative accumulations int8xint8 and uint8xuint8? ( already asked here: https://community.nxp.com/t5/i-MX-Processors/iMX-8M-Plus-NPU-info-and-Arm-Compute-Library/m-p/124132...)
Thanks,
V.
Hi Bio_TICFSL,
thank you for your answer.
Our own neural net performs with a >99% TOP-1 Accuracy when executed on a CPU. We obviously use CPU/GPU during quant-aware training for loss calculation. It's critical to maintain the same >99% TOP-1 Accuracy on NPU.
In order to do this, we can consider (during training) to take account of the NPU extra precision and somehow simulate it, but we obviously need to understand very well how it works and how it can impact us. If you have other methods in mind to make the NPU give us exactly the same results we expect from training, please tell us (training on NPU? not so practical)
Furthermore, is this rounding error the only source of difference CPU/NPU?
Can you better explain this with an example?
"While CPU uses 32bit registers, the NPU uses 16bit register for normalized multiplier and 48bit post multiplier output during quantized inference. This way, the CPU suffers from double rounding error, while NPU does not. "
Thank you very much, very useful.
Regards,
NB
Making inferences with the said .tflite model by using (ArmNN + vsi_npu) instead of (TFLite + NNAPI) gives exactly the same (wrong) results.
This certainly means that there is a problem in the bottom blocks: NNRT or OVXLIB or OpenVX or Hardware.
As per page 12 of this manual dated 31 March 2021, https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf
you can easily benchmark mobilenet_v1_1.0_224_quant.tflite on CPU and on NPU (--use_nnapi=true)
These are the result of inference:
The CPU activations are the "correct" ones, obtained by algebraically doing the calculations on any other computing platform for the same input image. This means that some approx is introduced by NNAPI delagation or NPU itself. Considering that this is an already quantized model, this is not good.
Hypotesis:
- per-layer quantizations vs per-tensor quantizations ?
- asymm vs symm quantization?
- int8 <-> uint8 conversions?
I kindly ask to NXP to clarify the source of the error in the calculations mobilenet_v1_1.0_224_quant.tflite. We can train better models then.
Thanks,
V.
Hello NeuralBlue,
Please check the answer provided here:
" CPU and NPU is different . While CPU uses 32bit registers, the NPU uses 16bit register for normalized multiplier and 48bit post multiplier output during quantized inference. This way, the CPU suffers from double rounding error, while NPU does not. "
Therefore the output is not equal.
Try to measure the overall accuracy difference of the model btw. CPU and NPU on larger dataset (not a single example). We did this kind of accuracy validation for PCQ models, with following results:
CPU (4 cores; TF Lite) | VSI NPU (TF Lite; NN API) | |
PCQ | PCQ | |
Accuracy | Accuracy | |
Top-1; Top-5 | Top-1; Top-5 | |
Mobilenet v1 1.0 224 | 70,80%; 88,20% | 68,48%; 88,01% |
Mobilenet v2 1.0 224 | 70,74%; 89,77% | 70,75%; 89,75% |
Efficientnet lite4 v2 | 77,30%; 94,00% | 76,40%; 93,70% |
Resnet v2 101 299 | 75,92%; 93,20% | 76,25%; 93,31% |
The highest difference we see for Top1 prediction is 2.32% for Mobilenet v1 (top 5 is 0.19%) and 0.3 for top5 prediction (Efficientnet model) "
Ignore the python use-case, the behavior is due to the HW precision btw CPU and NPU is different.
Let me know if further clarifications are needed.
Regards
Currently we are not aware of another root cause for the difference in accuracy.
I will check internally if we can share more details. Do you have NDA? is you have nda is better to have a internal ticket for it.
regards