I'm quite struggling for some time now trying to get NPU detection to work with a C++ program. The same code on the CPU gets optimal results, but using VX delegate the detections are completely wrong. The code seems to run smoothly and inference shows good timing (yolov5s model with 448x448 input ~ 70ms).
Right now I'm trying with Yolov5 (uint8 quantized), but I have tried with different pre-trained models obtaining the same behavior, good detection on CPU, and random detection on NPU.
To obtain the model I used the export from yolov5 repo:
python export.py --weights yolov5s.pt --imgsz 448 --include tflite --int8
I've also tried TFlite hub models like SSD and mobilenet, that have already been converted to uint8.
In the attachment the piece of code I am using for the inference and the converted yolov5n model.
What could it be the cause?
Solved! Go to Solution.
@simoberny Can you provide your complete code of tflite deployment on board as I am having problem to find the code and also I wrote but it has a lot of errors just wanted to cross check with it. As I can't find on yolo repository for tflite model.
Thanks in advance
I have compiled the latest SDK (Linux 6.1.1_1.0.0) for imx8m+ and flashed the new image on our custom board.
I tried to run the val.py from yolov5 repository .
python3 val.py --weights yolov5s-int8.tflite --data data/coco128.yaml --img 640
But the inference time observed is 2 seconds. Seems like it is running on CPU and vx_delegate is not enabled.
Any help is appreciated.
We are working on a project using a Yolov5 model. As others have experienced the model runs fine on the CPU, valid bounding boxes, and results that match what are expected. But we are having issues getting the model working with the NPU. We have processed the model as described in the Object Detection using YOLOv5 document mentioned in this discussion. The closest I've gotten is if I convert the saved_model.pb flavor (the frozen graph). But if I literally follow the documentation I get the error about the missing header and the failure to build the shader. At the end of the run we don't get any results. I'm trying to zero in on where the problem lies. We are currently using BSP version 5.15.52. But I see mention above that you need a minimum of 5.15.71. So could someone provide some guidance on how to resolve this issue. Note we are using C++. Thanks..
@simoberny : I am facing the same issue . For me the detections are showing correctly when printed on console. The issue is with the bounding box co-ordinates. Co-ordinates of detected objects are random and some are negative as well. Can you or NXP support help on this matter?
Which version of BSP?
So you have correct labels and predictions, but wrong bounding boxes?
In my case, everything seems wrong. The results seem totally random.
From the Variscite customer helpdesk, they say that the model should be rebuilt to be NPU-compatible. They sent me an optimized small mobile-net SSD model and the detections are perfect. But is actually a pain in the ass to train.
At least I know for sure that the problem is entirely related to the model itself. For now, I'll use CPU with a smaller YOLO model, In the hope of finding a way to use it with the NPU.
My version of BSP is 5.15.71.
I execute yolov5s-32fp-256.tflite on npu according to your teaching. My program is python and the .tflite has correct result on cpu. However, on npu, I get correct labels and predictions, but wrong bounding boxes.
in addition, some error appear when I use VX delegate. Can you tell me what's wrong?
I added cl_viv_vx_ext.h in version 5.15.71, and the error message in the picture disappeared.
However, I got the same result: correct labels and predictions, but wrong bounding boxes.
I roll back the version of BSP to 5.15.32 and the bounding boxes are correct!
I want to confirm whether version 5.15.71 is the cause of the incorrect results. Has anyone encountered the same problem?
I'am using now 5.15.71 and result are correct using Yolov5.
To be precise I'm using a yolov5s model trained on car recognition. I partially followed the official guide that was uploaded on this topic, but in the conversion to .tflite I used a uint8 quantization as input. In this way the NPU is fully exploited.
Results are very similar in terms of correctness to the CPU ones.
What version of tflite do you have? I'am currently with 2.9.1
Ah sorry I saw the message now, by now I had replied to the other message
Ok actually in your case it seems like something much more difficult to fix, it doesn't seem related only to the versions. Anyway, just as a last try if you want, take a look at the files I uploaded.
To be honest, in my opinion there is too much useless documentation on these NPUs, and too little of what is really needed. I don't know how many hours I wasted on it, only to find out, here on this forum, that a higher kernel version was needed.
Thanks for the response and the documentation.
The guide actually describes what I already did.
For the sake of scruple, I followed all the steps and recreated a new model. But the situation remains the same, on CPU it works perfectly, instead on NPU I have no result except random detection with really low confidence. I tried both with INT8 quantized and FLOAT.
I am on Yocto 5.15.52-2.1.0 which uses Tensorflow 2.5.0 as default. I'm now trying to compile a newer version.
Another strange behavior is that when I use the VX delegate I can't gently close the application, because Segmentation Fault occurs. VX delegate is compiled to the last version with git official repo.
I confirm that starting from version 5.15.71, the NPU works perfectly even with more complex models such as Yolov5. I get great results with reliability equal to the CPU.
I was initially unable to upgrade as there were no updated Basler drivers available for the release.
I'am now obtaining ~10 FPS on a real time vehicles detection and tracking application, using Variscite i.MX8 Plus module on custom board + Basler da2500-60mci camera + Yocto Kirkstone 5.15.71_2.2.0.
Thanks for support,
Glad you got it to work for you.
Unfortunately even though we are now at 5.15.71 BSP we continue to have the same issues when trying to use the NPU. Including using the .tflite model that is in the zip file along with the example script mentioned in the NXP YOLOv5 document. So that should eliminate anything strange about our model.
So it sounds like there might still be a mismatch of some kind in the Yocto build for our board. The board is from a different vendor than NXP, and the Yocto build details are from that company as well. So at this point I'm thinking that there is a subtle difference in the build that is causing our issues.