I'm quite struggling for some time now trying to get NPU detection to work with a C++ program. The same code on the CPU gets optimal results, but using VX delegate the detections are completely wrong. The code seems to run smoothly and inference shows good timing (yolov5s model with 448x448 input ~ 70ms).
Right now I'm trying with Yolov5 (uint8 quantized), but I have tried with different pre-trained models obtaining the same behavior, good detection on CPU, and random detection on NPU.
To obtain the model I used the export from yolov5 repo:
python export.py --weights yolov5s.pt --imgsz 448 --include tflite --int8
I've also tried TFlite hub models like SSD and mobilenet, that have already been converted to uint8.
In the attachment the piece of code I am using for the inference and the converted yolov5n model.
What could it be the cause?
@simoberny : I am facing the same issue . For me the detections are showing correctly when printed on console. The issue is with the bounding box co-ordinates. Co-ordinates of detected objects are random and some are negative as well. Can you or NXP support help on this matter?
Which version of BSP?
So you have correct labels and predictions, but wrong bounding boxes?
In my case, everything seems wrong. The results seem totally random.
From the Variscite customer helpdesk, they say that the model should be rebuilt to be NPU-compatible. They sent me an optimized small mobile-net SSD model and the detections are perfect. But is actually a pain in the ass to train.
At least I know for sure that the problem is entirely related to the model itself. For now, I'll use CPU with a smaller YOLO model, In the hope of finding a way to use it with the NPU.
My version of BSP is 5.15.71.
I execute yolov5s-32fp-256.tflite on npu according to your teaching. My program is python and the .tflite has correct result on cpu. However, on npu, I get correct labels and predictions, but wrong bounding boxes.
in addition, some error appear when I use VX delegate. Can you tell me what's wrong?
Thanks for the response and the documentation.
The guide actually describes what I already did.
For the sake of scruple, I followed all the steps and recreated a new model. But the situation remains the same, on CPU it works perfectly, instead on NPU I have no result except random detection with really low confidence. I tried both with INT8 quantized and FLOAT.
I am on Yocto 5.15.52-2.1.0 which uses Tensorflow 2.5.0 as default. I'm now trying to compile a newer version.
Another strange behavior is that when I use the VX delegate I can't gently close the application, because Segmentation Fault occurs. VX delegate is compiled to the last version with git official repo.
Unfortunately, this is not an option for me since I'm using a Basler camera and for that BSP version, there is still no driver available. Actually with the Variscite board, right now, the BSP is only up to 5.15.60.
Could I ask what's the difference in the 5.15.70 version, that enables the use of NPU with Yolo models?
I will try in the future, but for now, I'm forced to use the CPU with a smaller model to have a decent time.