NPU bad detection with Yolov5 - i.MX8MP

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

NPU bad detection with Yolov5 - i.MX8MP

Jump to solution
6,217 Views
simoberny
Contributor III

Hi, 

I'm quite struggling for some time now trying to get NPU detection to work with a C++ program. The same code on the CPU gets optimal results, but using VX delegate the detections are completely wrong. The code seems to run smoothly and inference shows good timing (yolov5s model with 448x448 input ~ 70ms). 

Right now I'm trying with Yolov5 (uint8 quantized), but I have tried with different pre-trained models obtaining the same behavior, good detection on CPU, and random detection on NPU. 

To obtain the model I used the export from yolov5 repo: 

 python export.py --weights yolov5s.pt  --imgsz 448 --include tflite --int8

I've also tried TFlite hub models like SSD and mobilenet, that have already been converted to uint8. 

 

In the attachment the piece of code I am using for the inference and the converted yolov5n model. 

What could it be the cause? 

 

Thanks,

Best regards

0 Kudos
1 Solution
6,087 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hi,

 

At least, You have to change to 5.15.71 BSP.

Regards

View solution in original post

25 Replies
2,615 Views
simoberny
Contributor III

1. Who is the board vendor? 

2. Kernel version is 5.15.71, but what version of Tflite does the vendor preinstalled in the yocto recipe? Variscite preloaded tensorflow 2.9.1.

3. Your issue is now wrong bounding boxes right?

 

I attach the small model I'm currently using with the inference code. I hope it helps you. 

0 Kudos
378 Views
bb4567
Contributor I

Well, it only took a couple of months and opening a 2nd support ticket, but we finally resolved the issue we were having with processing yolov5 models.

We were at the proper BSP and TFLite versions as per NXP but still not getting valid results from the NPU.

Last week I received a patch file from NXP support for the vx-delegate op_map.cc file.

Didn't get a chance to apply and test as I was traveling, but tested yesterday and we now get nearly identical results when using the NPU.   It was a fairly significant change, and not sure why their Yocto build didn't already have this patch available/applied.

But at least that is resolved.

There's one other issue that others have reported that we see with our C++ test application that doesn't happen with the NXP Python test app so I've asked support about that issue to see what they can suggest.

So, if you have the proper BSP, and TFlite versions and are still not able to get the NPU processing working with yolov5 models try opening a ticket and requesting the patch file for the op_map.cc file for vx-delegate.  Hopefully that will fix it.

0 Kudos
643 Views
bb4567
Contributor I

Thanks again for the model.  I've been traveling for a while and had meant to get back to you.

I tried with your model and it fails for us in the same way as our model.

What board are you running with?  We are currently working with a TechNexion development board.

We did a full Yocto build and the version for TFlite and others appear to match, but no joy.

This is even when running the Python test script that in in the Zipfile from the NXP Yolo how to document.  Using the NXP model.

So we are still chasing some other issue.

Thanks,

 

0 Kudos
790 Views
bb4567
Contributor I
Thanks, I'll try your model to see if it behaves any differently, but I suspect it will be just like the one from the example in the NXP Yolov5 doc. We ultimately have bad bounding boxes, but also invalid detections as well. So we will have to keep digging.
0 Kudos
1,503 Views
simoberny
Contributor III

Unfortunately, this is not an option for me since I'm using a Basler camera and for that BSP version, there is still no driver available. Actually with the Variscite board, right now, the BSP is only up to 5.15.60. 

Could I ask what's the difference in the 5.15.70 version, that enables the use of NPU with Yolo models? 

I will try in the future, but for now, I'm forced to use the CPU with a smaller model to have a decent time.

Thank you, 

Best regards 

 

 

0 Kudos