NPU bad detection with Yolov5 - i.MX8MP

simoberny · ‎01-25-2023

Hi,

I'm quite struggling for some time now trying to get NPU detection to work with a C++ program. The same code on the CPU gets optimal results, but using VX delegate the detections are completely wrong. The code seems to run smoothly and inference shows good timing (yolov5s model with 448x448 input ~ 70ms).

Right now I'm trying with Yolov5 (uint8 quantized), but I have tried with different pre-trained models obtaining the same behavior, good detection on CPU, and random detection on NPU.

To obtain the model I used the export from yolov5 repo:

 python export.py --weights yolov5s.pt  --imgsz 448 --include tflite --int8

I've also tried TFlite hub models like SSD and mobilenet, that have already been converted to uint8.

In the attachment the piece of code I am using for the inference and the converted yolov5n model.

What could it be the cause?

Thanks,

Best regards

Bio_TICFSL · ‎02-07-2023

Hi,

At least, You have to change to 5.15.71 BSP.

Regards

View solution in original post

wamiqraza · ‎11-20-2023

@simoberny Can you provide your complete code of tflite deployment on board as I am having problem to find the code and also I wrote but it has a lot of errors just wanted to cross check with it. As I can't find on yolo repository for tflite model.
Thanks in advance

Amal_Antony3331 · ‎06-15-2023

Hi @simoberny

I have compiled the latest SDK (Linux 6.1.1_1.0.0) for imx8m+ and flashed the new image on our custom board.

I tried to run the val.py from yolov5 repository .

python3 val.py --weights yolov5s-int8.tflite --data data/coco128.yaml --img 640

But the inference time observed is 2 seconds. Seems like it is running on CPU and vx_delegate is not enabled.

How can we enable/use the GPU/NPU hardware accelerator using the VX Delegate on i.MX8M+ ?

Any help is appreciated.

bb4567 · ‎06-15-2023

I don't think that you can run on the NPU with the val.py script. If you look at the script you will see there are options for CUDA (GPU) device and CPU. The default appears to be to run on the CPU as you have discovered. There is an NXP document on running yolov5 models that may help a bit. Although your mileage may vary. We are still struggling to get ANY yolov5 model to work properly with the NPU even on BSP 5.15.71 with TFLite 2.9.1 even with the model and script from the zip file mentioned in the NXP Yolov5 document. We had to open a support ticket to finally get the zip file as the link in the document was dead, as appears to be true with much of NXP's documentation, Hopefully that will point you in the right direction, there are also as mentioned in that document environment variables you can set to select the NPU, or GPU in addition to running on the CPU.

bb4567 · ‎04-26-2023

We are working on a project using a Yolov5 model. As others have experienced the model runs fine on the CPU, valid bounding boxes, and results that match what are expected. But we are having issues getting the model working with the NPU. We have processed the model as described in the Object Detection using YOLOv5 document mentioned in this discussion. The closest I've gotten is if I convert the saved_model.pb flavor (the frozen graph). But if I literally follow the documentation I get the error about the missing header and the failure to build the shader. At the end of the run we don't get any results. I'm trying to zero in on where the problem lies. We are currently using BSP version 5.15.52. But I see mention above that you need a minimum of 5.15.71. So could someone provide some guidance on how to resolve this issue. Note we are using C++. Thanks..

sams4 · ‎02-22-2023

@simoberny : I am facing the same issue . For me the detections are showing correctly when printed on console. The issue is with the bounding box co-ordinates. Co-ordinates of detected objects are random and some are negative as well. Can you or NXP support help on this matter?

Thanks!!!

simoberny · ‎02-24-2023

Which version of BSP?

So you have correct labels and predictions, but wrong bounding boxes?

In my case, everything seems wrong. The results seem totally random.

From the Variscite customer helpdesk, they say that the model should be rebuilt to be NPU-compatible. They sent me an optimized small mobile-net SSD model and the detections are perfect. But is actually a pain in the ass to train.

At least I know for sure that the problem is entirely related to the model itself. For now, I'll use CPU with a smaller YOLO model, In the hope of finding a way to use it with the NPU.

Bests

sams4 · ‎03-01-2023

@simoberny : The BSP is upgraded to 5.15.32_2.0.0 version and it worked.

Detections and bounding boxes started appearing.

Thanks!!!

Bio_TICFSL · ‎01-26-2023

Hello,

Attached you will find some benchmark on vx delegate and MX8MPlus, also it is an appnote on object detection.

Hope this helps

hy982530 · ‎03-22-2023

Hello, @Bio_TICFSL

My version of BSP is 5.15.71.

I execute yolov5s-32fp-256.tflite on npu according to your teaching. My program is python and the .tflite has correct result on cpu. However, on npu, I get correct labels and predictions, but wrong bounding boxes.

in addition, some error appear when I use VX delegate. Can you tell me what's wrong?

Thank you,

Best regards

adarshkv

can you share the detection code in python for this

hy982530 · ‎03-25-2023

I added cl_viv_vx_ext.h in version 5.15.71, and the error message in the picture disappeared.

However, I got the same result: correct labels and predictions, but wrong bounding boxes.

hy982530 · ‎03-25-2023

I roll back the version of BSP to 5.15.32 and the bounding boxes are correct!

I want to confirm whether version 5.15.71 is the cause of the incorrect results. Has anyone encountered the same problem?

Thanks

simoberny · ‎05-11-2023

I'am using now 5.15.71 and result are correct using Yolov5.

To be precise I'm using a yolov5s model trained on car recognition. I partially followed the official guide that was uploaded on this topic, but in the conversion to .tflite I used a uint8 quantization as input. In this way the NPU is fully exploited.

Results are very similar in terms of correctness to the CPU ones.

What version of tflite do you have? I'am currently with 2.9.1

Bests

bb4567 · ‎05-11-2023

Thanks. We are also running 2.9.1 of tflite. Our model is trained to recognize 10 classes of objects, It is also a yolov5s model before conversion. Runs fine on the CPU, and also runs fine with the Coral TPU. But the NPU is not loving the model. So I tested with the NXP pre-coverted .tflite model for their YOLOv5 example and that fails in the same manner. So something else has to be the issue in our configuration.

simoberny · ‎05-11-2023

Ah sorry I saw the message now, by now I had replied to the other message

Ok actually in your case it seems like something much more difficult to fix, it doesn't seem related only to the versions. Anyway, just as a last try if you want, take a look at the files I uploaded.

To be honest, in my opinion there is too much useless documentation on these NPUs, and too little of what is really needed. I don't know how many hours I wasted on it, only to find out, here on this forum, that a higher kernel version was needed.

bb4567 · ‎05-11-2023

Yes, the documentation has been a frustration for me as well. And not very good explanations from NXP support. For example one of the issues we get is it appears that the code thinks part way though execution that it should now be running on the GPU and so attempts to dynamically build a shader, which of course fails. That was a similar error that a couple of others on the forums have reported, but not explanation of what triggers that.

simoberny · ‎01-26-2023

Thanks for the response and the documentation.

The guide actually describes what I already did.

For the sake of scruple, I followed all the steps and recreated a new model. But the situation remains the same, on CPU it works perfectly, instead on NPU I have no result except random detection with really low confidence. I tried both with INT8 quantized and FLOAT.

I am on Yocto 5.15.52-2.1.0 which uses Tensorflow 2.5.0 as default. I'm now trying to compile a newer version.

Another strange behavior is that when I use the VX delegate I can't gently close the application, because Segmentation Fault occurs. VX delegate is compiled to the last version with git official repo.

Thanks

simoberny · ‎01-31-2023

I wanted to clarify that the version I'm working on is 5.10.52.

Also, the yolov5_decode python script used in the guide is not accessible

Bio_TICFSL · ‎02-07-2023

Hi,

At least, You have to change to 5.15.71 BSP.

Regards

simoberny · ‎05-11-2023

Good morning,

I confirm that starting from version 5.15.71, the NPU works perfectly even with more complex models such as Yolov5. I get great results with reliability equal to the CPU.

I was initially unable to upgrade as there were no updated Basler drivers available for the release.

I'am now obtaining ~10 FPS on a real time vehicles detection and tracking application, using Variscite i.MX8 Plus module on custom board + Basler da2500-60mci camera + Yocto Kirkstone 5.15.71_2.2.0.

Thanks for support,

Best regards