Hello NeuralBlue,
Please check the answer provided here:
" CPU and NPU is different . While CPU uses 32bit registers, the NPU uses 16bit register for normalized multiplier and 48bit post multiplier output during quantized inference. This way, the CPU suffers from double rounding error, while NPU does not. "
Therefore the output is not equal.
Try to measure the overall accuracy difference of the model btw. CPU and NPU on larger dataset (not a single example). We did this kind of accuracy validation for PCQ models, with following results:
| |
CPU (4 cores; TF Lite) |
VSI NPU (TF Lite; NN API) |
| |
PCQ |
PCQ |
| |
Accuracy |
Accuracy |
| |
Top-1; Top-5 |
Top-1; Top-5 |
| Mobilenet v1 1.0 224 |
70,80%; 88,20% |
68,48%; 88,01% |
| Mobilenet v2 1.0 224 |
70,74%; 89,77% |
70,75%; 89,75% |
| Efficientnet lite4 v2 |
77,30%; 94,00% |
76,40%; 93,70% |
| Resnet v2 101 299 |
75,92%; 93,20% |
76,25%; 93,31% |
The highest difference we see for Top1 prediction is 2.32% for Mobilenet v1 (top 5 is 0.19%) and 0.3 for top5 prediction (Efficientnet model) "
Ignore the python use-case, the behavior is due to the HW precision btw CPU and NPU is different.
Let me know if further clarifications are needed.
Regards