Yolov5 Tflite CPU vs VX_delegate NPU

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

Yolov5 Tflite CPU vs VX_delegate NPU

3,708 次查看
taklause
Contributor II

Hello, we have trained successfully a YoloV5 and converted it to uint8. 

The benchmark

./benchmark_model --graph=model.tflite --external_delegate_path=/usr/lib/libvx_delegate.so --enable_op_profiling=true

benchmark output : 

External delegate path: [/usr/lib/libvx_delegate.so]
Loaded model model.tflite
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
EXTERNAL delegate created.
Going to apply 1 delegates one after another.
Explicitly applied EXTERNAL delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 1.98869

Shows inference speed of ~66 ms which is not too bad. The output also shows that the inference is completely executed on the npu. 

I started to implement the extraction of the bounding box from the prediction of the input image. I used python for a quick evaluation : 

delegate = [tflite.load_delegate(library="/usr/lib/libvx_delegate.so", options={"logging-severity":"info"})]    interpreter = tflite.Interpreter(model_path=weights,experimental_delegates=delegate)
interpreter.allocate_tensors()
 
The following image is twice inferenced, once with the experimental_delegates and once without. 
The "good" result was from the CPU. The NPU delivers random bounding boxes. The code behind the extraction is exactly the same. 
 
How can that be?

Inference with NPUInference with NPU

 

Inference on CPUInference on CPU

 

 
 

 

0 项奖励
14 回复数

2,044 次查看
bb4567
Contributor I

If you haven't resolved your issue yet you might need to open a support ticket.

 

It took me two different tickets but I finally received a patch from NXP support that I applied to op_map.cc in the vx-delegate.  That resolved the issue for us and we can now get good results from the NPU that are very close to the CPU results.

0 项奖励

1,421 次查看
Olivier_B
Contributor II

Is that patch public? Can you share it with us? We are facing the same issues.

0 项奖励

3,484 次查看
abhishek_ml
Contributor I

@taklause can you share the inference code ? I am facing similar issue.

0 项奖励

3,443 次查看
taklause
Contributor II

Hi @abhishek_ml,

there you go, props. to NXP. 

3,667 次查看
taklause
Contributor II

Hi @Zhiming_Liu ,

I did as the document suggested.

 

  1. I exported the .pb file from the finetuned ultralytics model with 256x256x3 input format.
  2. With the .pb model, I converted it with the settings in the following image:
  3. Converter SettingsConverter Settings
  4. Then I uploaded the converted .tflite model to the NPU and did the same test as before. With similar results..
    1. The one with the correct prediction is executed with the CPU, and the model is converted from Ultralytics directly. 
    2. The one with the correct prediction is the EIQ Converted model, executed on the CPU
    3. The one with the no prediction is the EIQ Converted model, executed on the NPU

As a remark, all the models are quantized with the same images. The models (raw - pb and converted tflite from Ultralytics) are also attached.

Regards

NPU EIQ CONVERTEDNPU EIQ CONVERTED

 

CPU EIQ CONVERTEDCPU EIQ CONVERTED

 

CPU ULTRALYTICS DIRECTCPU ULTRALYTICS DIRECT

 

 

0 项奖励

3,696 次查看
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hi @taklause 

Send you the test code.

0 项奖励

3,598 次查看
taklause
Contributor II

Hello @Zhiming_Liu ,

I reply directly in hope someone would help me fix the problem ;).

I m still having no detection inferencing with the NPU. Only the CPU gives me good detections, even when

I convert them with the eIQ tool you suggested. 

Thanks, regards Daniel

0 项奖励

3,594 次查看
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hello @taklause 

Please provide test image and label.

0 项奖励

3,585 次查看
taklause
Contributor II

Hello @Zhiming_Liu,

thanks for the support. 

I attach two models, the pb file as the yolov5 unconverted and the efficientdetlite0 already converted.

Both deliver no results with the NPU, only with the CPU.

It is only a person detector, trained with the persons from COCO17 and a little bit of our data. There are three images and the label.txt.

Thanks again for looking into it.

Daniel

 

0 项奖励

3,581 次查看
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Which tag you used about yolov5, v7.0?

Have you changed any net struct in your training?

0 项奖励

3,389 次查看
taklause
Contributor II

Attached the Yolov5 v6.0 nano input size, person only. 

0 项奖励

3,390 次查看
taklause
Contributor II
We have both available YoloV5 v6.0 and v7.0. I am still struggeling. Our business partner still has interests on this model and this board. So please help us
0 项奖励

3,574 次查看
taklause
Contributor II
No, we did not change the structure apart from the input dimension. It is the version 5.0
0 项奖励

3,692 次查看
taklause
Contributor II

Hi @Zhiming_Liu ,

the access to the file was denied, I cant watch it.

0 项奖励