I'm quite struggling for some time now trying to get NPU detection to work with a C++ program. The same code on the CPU gets optimal results, but using VX delegate the detections are completely wrong. The code seems to run smoothly and inference shows good timing (yolov5s model with 448x448 input ~ 70ms).
Right now I'm trying with Yolov5 (uint8 quantized), but I have tried with different pre-trained models obtaining the same behavior, good detection on CPU, and random detection on NPU.
To obtain the model I used the export from yolov5 repo:
python export.py --weights yolov5s.pt --imgsz 448 --include tflite --int8
I've also tried TFlite hub models like SSD and mobilenet, that have already been converted to uint8.
In the attachment the piece of code I am using for the inference and the converted yolov5n model.
What could it be the cause?
Well, it only took a couple of months and opening a 2nd support ticket, but we finally resolved the issue we were having with processing yolov5 models.
We were at the proper BSP and TFLite versions as per NXP but still not getting valid results from the NPU.
Last week I received a patch file from NXP support for the vx-delegate op_map.cc file.
Didn't get a chance to apply and test as I was traveling, but tested yesterday and we now get nearly identical results when using the NPU. It was a fairly significant change, and not sure why their Yocto build didn't already have this patch available/applied.
But at least that is resolved.
There's one other issue that others have reported that we see with our C++ test application that doesn't happen with the NXP Python test app so I've asked support about that issue to see what they can suggest.
So, if you have the proper BSP, and TFlite versions and are still not able to get the NPU processing working with yolov5 models try opening a ticket and requesting the patch file for the op_map.cc file for vx-delegate. Hopefully that will fix it.
Thanks again for the model. I've been traveling for a while and had meant to get back to you.
I tried with your model and it fails for us in the same way as our model.
What board are you running with? We are currently working with a TechNexion development board.
We did a full Yocto build and the version for TFlite and others appear to match, but no joy.
This is even when running the Python test script that in in the Zipfile from the NXP Yolo how to document. Using the NXP model.
So we are still chasing some other issue.
Unfortunately, this is not an option for me since I'm using a Basler camera and for that BSP version, there is still no driver available. Actually with the Variscite board, right now, the BSP is only up to 5.15.60.
Could I ask what's the difference in the 5.15.70 version, that enables the use of NPU with Yolo models?
I will try in the future, but for now, I'm forced to use the CPU with a smaller model to have a decent time.