Dear NXP,
I convert yolov7tiny.pt (yolov7-tiny model) to yolov7tiny.onnx with uint8 weights, and then run yolov7tiny.onnx on i.MX 8M Plus NPU. The input size is 224x224, but the npu inference time is 127 ms. It seems too slow. Is this time reasonable?
The following are my onnx model conversion steps and my onnxruntime execution code:
1. download yolov7-tiny.pt from https://github.com/WongKinYiu/yolov7/releases, and rename yolov7tiny.pt
2. convert yolov7tiny.pt to yolov7tiny.onnx (this onnx has fp32 weights)
(onnx==1.10.0 and opset=15)
$ git clone https://github.com/WongKinYiu/yolov7.git
$ python export.py --weights ./yolov7tiny.pt --img-size 224
Note: I modify some code in export.py in the attachment.
3. quantize yolov7tiny.onnx and the output is called yolov7tiny_uint8.onnx
Here I refer to https://github.com/microsoft/onnxruntime/issues/10787.
$ python quantize_yolo.py
Note: quantize_yolo.py in the attachment is the code I quantize onnx model.
4. run yolov7tiny_uint8.onnx on npu with onnxruntime_perf_test
$ /usr/bin/onnxruntime-1.10.0/onnxruntime_perf_test ./yolov7tiny_uint8.onnx -r 1 -e nnapi
the result:
I put my relevant files in the attachment.
Any help, much appreciated.
Hi @hy982530
Hi, @Sanket_Parekh
The version I'm working on is 5.15.71.
Can you provide the quantized .onnx or .tflite files of yolov3, yolov5 or yolov7 for my reference?
Or can you help me to test if my quantized onnx file is correct?
Thank you,
Best regards