Yolov5 Tflite CPU vs VX_delegate NPU

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Yolov5 Tflite CPU vs VX_delegate NPU

3,707 Views
taklause
Contributor II

Hello, we have trained successfully a YoloV5 and converted it to uint8. 

The benchmark

./benchmark_model --graph=model.tflite --external_delegate_path=/usr/lib/libvx_delegate.so --enable_op_profiling=true

benchmark output : 

External delegate path: [/usr/lib/libvx_delegate.so]
Loaded model model.tflite
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
EXTERNAL delegate created.
Going to apply 1 delegates one after another.
Explicitly applied EXTERNAL delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 1.98869

Shows inference speed of ~66 ms which is not too bad. The output also shows that the inference is completely executed on the npu. 

I started to implement the extraction of the bounding box from the prediction of the input image. I used python for a quick evaluation : 

delegate = [tflite.load_delegate(library="/usr/lib/libvx_delegate.so", options={"logging-severity":"info"})]    interpreter = tflite.Interpreter(model_path=weights,experimental_delegates=delegate)
interpreter.allocate_tensors()
 
The following image is twice inferenced, once with the experimental_delegates and once without. 
The "good" result was from the CPU. The NPU delivers random bounding boxes. The code behind the extraction is exactly the same. 
 
How can that be?

Inference with NPUInference with NPU

 

Inference on CPUInference on CPU

 

 
 

 

0 Kudos
14 Replies

2,043 Views
bb4567
Contributor I

If you haven't resolved your issue yet you might need to open a support ticket.

 

It took me two different tickets but I finally received a patch from NXP support that I applied to op_map.cc in the vx-delegate.  That resolved the issue for us and we can now get good results from the NPU that are very close to the CPU results.

0 Kudos

1,420 Views
Olivier_B
Contributor II

Is that patch public? Can you share it with us? We are facing the same issues.

0 Kudos

3,483 Views
abhishek_ml
Contributor I

@taklause can you share the inference code ? I am facing similar issue.

0 Kudos

3,442 Views
taklause
Contributor II

Hi @abhishek_ml,

there you go, props. to NXP. 

3,666 Views
taklause
Contributor II

Hi @Zhiming_Liu ,

I did as the document suggested.

 

  1. I exported the .pb file from the finetuned ultralytics model with 256x256x3 input format.
  2. With the .pb model, I converted it with the settings in the following image:
  3. Converter SettingsConverter Settings
  4. Then I uploaded the converted .tflite model to the NPU and did the same test as before. With similar results..
    1. The one with the correct prediction is executed with the CPU, and the model is converted from Ultralytics directly. 
    2. The one with the correct prediction is the EIQ Converted model, executed on the CPU
    3. The one with the no prediction is the EIQ Converted model, executed on the NPU

As a remark, all the models are quantized with the same images. The models (raw - pb and converted tflite from Ultralytics) are also attached.

Regards

NPU EIQ CONVERTEDNPU EIQ CONVERTED

 

CPU EIQ CONVERTEDCPU EIQ CONVERTED

 

CPU ULTRALYTICS DIRECTCPU ULTRALYTICS DIRECT

 

 

0 Kudos

3,695 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hi @taklause 

Send you the test code.

0 Kudos

3,597 Views
taklause
Contributor II

Hello @Zhiming_Liu ,

I reply directly in hope someone would help me fix the problem ;).

I m still having no detection inferencing with the NPU. Only the CPU gives me good detections, even when

I convert them with the eIQ tool you suggested. 

Thanks, regards Daniel

0 Kudos

3,593 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hello @taklause 

Please provide test image and label.

0 Kudos

3,584 Views
taklause
Contributor II

Hello @Zhiming_Liu,

thanks for the support. 

I attach two models, the pb file as the yolov5 unconverted and the efficientdetlite0 already converted.

Both deliver no results with the NPU, only with the CPU.

It is only a person detector, trained with the persons from COCO17 and a little bit of our data. There are three images and the label.txt.

Thanks again for looking into it.

Daniel

 

0 Kudos

3,580 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Which tag you used about yolov5, v7.0?

Have you changed any net struct in your training?

0 Kudos

3,388 Views
taklause
Contributor II

Attached the Yolov5 v6.0 nano input size, person only. 

0 Kudos

3,389 Views
taklause
Contributor II
We have both available YoloV5 v6.0 and v7.0. I am still struggeling. Our business partner still has interests on this model and this board. So please help us
0 Kudos

3,573 Views
taklause
Contributor II
No, we did not change the structure apart from the input dimension. It is the version 5.0
0 Kudos

3,691 Views
taklause
Contributor II

Hi @Zhiming_Liu ,

the access to the file was denied, I cant watch it.

0 Kudos