Yolov8l TFLite pretrained model invoke gives different result on cpu and npu(for both python C++)

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Yolov8l TFLite pretrained model invoke gives different result on cpu and npu(for both python C++)

Jump to solution
842 Views
yidema
Contributor I

Hi,

When i using yolov8l tflite pretrained quantized model(only input and output tensor is float, all other layers are int), with invoke on cpu and npu, it gives different result with same python or c++ script.

so on python this is output from yolov8 on CPU: it output a (1,84,8400):

[[[ 8.4694298e-03 2.5408305e-02 2.5408305e-02 ... 8.5541326e-01
9.0199518e-01 9.4857705e-01]
[ 1.6938869e-02 2.1173587e-02 2.1173587e-02 ... 9.8245484e-01
9.8245484e-01 9.8245484e-01]
[ 1.6938869e-02 4.2347182e-02 4.6581902e-02 ... 3.6418584e-01
3.4724694e-01 3.3454281e-01]
...
[-8.3819032e-09 -8.3819032e-09 -8.3819032e-09 ... -8.3819032e-09
-8.3819032e-09 -8.3819032e-09]
[-8.3819032e-09 -8.3819032e-09 -8.3819032e-09 ... -8.3819032e-09
-8.3819032e-09 -8.3819032e-09]
[-8.3819032e-09 -8.3819032e-09 -8.3819032e-09 ... -8.3819032e-09
-8.3819032e-09 -8.3819032e-09]]]

on NPU: 

W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
W [op_optimize:676]stride slice copy tensor.
[[[0.02933936 0.06706139 0.0461047 ... 1.0352602 1.0352602 1.0352602 ]
[0.00838267 0. 0.01676535 ... 0.85084134 0.96400744 0.8047366 ]
[0.07125273 0.14250545 0.06706139 ... 0.7879713 0.7418666 0.821502 ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]]

 C++ have the same thing, using same script on cpu and npu, just change line using vxdelegate, different results.

We have tried very carefully about the preprocessing to make sure each pixel is the same float number in both python and C++, but using the same script on cpu and npu just give different results(only commented the line using libvxdelegate). if someone have any thoughts on this we will really appreciated!

 

best

0 Kudos
1 Solution
772 Views
brian14
NXP TechSupport
NXP TechSupport

Hi @yidema

Thank you for contacting NXP Support.

It is important to say that we are not expecting exactly the same result for CPU and NPU model execution due to the differences related to architectures and floating points values computation.

Could you please tell me your BSP version?

Please try our latest BSP version and tell me if this works to check other solutions.

Thank you and have a wonderful day!

View solution in original post

3 Replies
773 Views
brian14
NXP TechSupport
NXP TechSupport

Hi @yidema

Thank you for contacting NXP Support.

It is important to say that we are not expecting exactly the same result for CPU and NPU model execution due to the differences related to architectures and floating points values computation.

Could you please tell me your BSP version?

Please try our latest BSP version and tell me if this works to check other solutions.

Thank you and have a wonderful day!

705 Views
yidema
Contributor I

Hi,

As you suggested, we updated the new bsp version and it solved this issue, its even more accurate than cpu. Really appreciated for the fast help!

 

best

yide

 

0 Kudos
829 Views
yidema
Contributor I

in addition, we also tested yolov5 on C++, same thing, only change line of vxdelegate, all others are the same, gives different results: 

On CPU:

 Tensorflow Test 
Reading image 
IMAGE SIZE IS 281776
Reading image 
IMAGE SIZE IS 348944
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Creating dst
Creating dst2
Creating dst3
Creating dst4
Creating dst5
Creating dst6
Creating dst7
 IDX is 7
 Rect is 1340.86 643.08 1782.28 1080.39 
 IDX is 5
 Rect is 1484.66 141.859 1570.49 179.698

 

on npu: 

Tensorflow Test
Reading image
IMAGE SIZE IS 281776
Reading image
IMAGE SIZE IS 348944
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
ERROR: hybrid data type is not supported in conv2d.
ERROR: hybrid data type is not supported in conv2d.
ERROR: hybrid data type is not supported in conv2d.
ERROR: hybrid data type is not supported in conv2d.
Creating dst
Creating dst2
Creating dst3
Creating dst4
Creating dst5
Creating dst6
Creating dst7
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
W [HandleLayoutInfer:257]Op 19: default layout inference pass.
W [HandleLayoutInfer:257]Op 18: default layout inference pass.
 IDX is 0
 Rect is 1776 999 1776 999 
 IDX is 1
 Rect is 1776 999 1776 999 
 IDX is 0
 Rect is 1872 249.75 1872 1856.25 
 IDX is 0
 Rect is 1872 641.25 1872 1464.75 
 IDX is 4
 Rect is 1776 1107 1776 1107

 

Also want to mention for both yolov8l and yolov5, result from cpu is correct and from npu is wrong.

0 Kudos