How to deploy YOLO8 on IMX8MP?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to deploy YOLO8 on IMX8MP?

1,665 Views
Joshua2
Contributor II

Hi ,

HW: imx8mp-evk.
SW: LF_v5.10.72-2.2.0_images_IMX8MPEVK
PC: ubuntu20.04
Reference document: i.MX_Machine_Learning_User's_Guide.pdf


We are deploying YOLO8 on IMX8MP, but we are encountering issues.
URL: https://github.com/NXP/eiq-model-zoo.git
branch: main
commit: 58c2b002e9f64f39b8c43e896e00446298544a33


We refer to the README.md of eiq-model-zoo/tasks/visit/object-detection/yolov8


1) The script 'bash recipe. sh' was not found.
2)‘’yolo export model=yolov8n.pt imgsz=640 format=tflite int8 separate_outputs=True”, an error was reported.

///

imx8mp@E480:~/git/imx8mp/cyberbee/NFS/gst/ultralytics_yolov8$ yolo export model=yolov8n.pt imgsz=640 format=tflite int8 separate_outputs=True
Traceback (most recent call last):
File "/usr/local/bin/yolo", line 8, in <module>
sys.exit(entrypoint())
File "/home/xhwang/.local/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 903, in entrypoint
check_dict_alignment(full_args_dict, overrides)
File "/home/xhwang/.local/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 485, in check_dict_alignment
raise SyntaxError(string + CLI_HELP_MSG) from e
SyntaxError: 'separate_outputs' is not a valid YOLO argument.

Arguments received: ['yolo', 'export', 'model=yolov8n.pt', 'imgsz=640', 'format=tflite', 'int8', 'separate_outputs=True']. Ultralytics 'yolo' commands use the following syntax:

yolo TASK MODE ARGS

''''''''''''''''''''''''

 

Thanks,

Joshua

 

0 Kudos
Reply
8 Replies

1,516 Views
Joshua2
Contributor II

Hi Zhiming,

    The problem with YOLO's reasoning is still unresolved and we need your help.

   IMAGE: LF_v6.6.52-2.2.0_images_IMX8MPEVK

   I tried using NPU inference, but the speed was very slow. The CPU takes about 50ms, but the NPU requires 3500ms. Is there a problem with the configuration?

NPU:

python3 main.py --model yolov8n_full_integer_quant.tflite --img image.jpg --conf-thres 0.5 --iou-thres 0.5
INFO: Vx delegate: allowed_cache_mode set to 0.
INFO: Vx delegate: device num set to 0.
INFO: Vx delegate: allowed_builtin_code set to 0.
INFO: Vx delegate: error_during_init set to 0.
INFO: Vx delegate: error_during_prepare set to 0.
INFO: Vx delegate: error_during_invoke set to 0.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.


##########Inference time: 3446.0 ms

img_width 256 img_height 256
[[[ 2.6509 15.906 15.906 ... 145.8 178.94 243.89]
[ 7.9528 11.929 11.929 ... 198.82 185.57 189.54]
[ 6.6274 33.137 35.788 ... 214.73 145.8 59.646]
...
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]]]
[32.7551794052124, 240.449116230011, 771.6738510131836, 469.714515209198] 0.8750193 5
[57.9184627532959, 394.2247134447098, 162.1633529663086, 508.8573968410492] 0.7973549 0
[675.8167366683483, 455.7349169254303, 134.20414835214615, 419.387948513031] 0.7507562 0
[222.8777128458023, 402.6124691963196, 123.0204713344574, 447.3471450805664] 0.6316708 0

 

CPU:

python3 main.py --model yolov8n_full_integer_quant.tflite --img image.jpg --conf-thres 0.5 --iou-thres 0.5
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.


##########Inference time: 48.3 ms

img_width 256 img_height 256
[[[ 2.6509 15.906 15.906 ... 143.15 180.26 245.21]
[ 6.6274 11.929 11.929 ... 193.52 185.57 189.54]
[ 6.6274 33.137 35.788 ... 212.08 132.55 58.321]
...
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]]]
[32.7551794052124, 240.449116230011, 771.6738510131836, 469.714515209198] 0.8750193 5
[57.9184627532959, 394.2247134447098, 162.1633529663086, 508.8573968410492] 0.7973549 0
[678.6126579344273, 455.7349169254303, 128.61230581998825, 419.387948513031] 0.7507562 0
[225.67363411188126, 402.6124691963196, 117.4286288022995, 447.3471450805664] 0.6316708 0

 

Thanks,

Joshua

 

0 Kudos
Reply

1,649 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hello,

Please download from here https://github.com/DeGirum/ultralytics_yolov8

 

How to get model

  1. Note, that model is released under AGPL 3.0 license
  2. visit DeGirum's GitHub repository and clone it
  3. install all necessary dependencies
  4. run following command to create fully quantized int8 model with separate outputs
yolo export model=yolov8n.pt imgsz=640 format=tflite int8 separate_outputs=True
 
  1. The TFLite model file for i.MX 8M Plus and for i.MX 93 is yolov8n_full_integer_quant.tflite located in the yolov8n_saved_model directory.



Best Regards,
Zhiming

0 Kudos
Reply

1,633 Views
Joshua2
Contributor II

Hi Zhiming,

    Thank you for your reply!
    I am using ultralytics_yolov8.

https://github.com/DeGirum/ultralytics_yolov8
branch:master
commit 75cab2e0c68723d4344c69a3bcd85265a582ab3d

 

hwang@E480:~/git/imx8mp/cyberbee/NFS/gst/ultralytics_yolov8$ yolo export model=yolov8n.pt imgsz=640 format=tflite int8 separate_outputs=True

Traceback (most recent call last):
File "/usr/local/bin/yolo", line 8, in <module>
sys.exit(entrypoint())
File "/home/xhwang/.local/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 903, in entrypoint
check_dict_alignment(full_args_dict, overrides)
File "/home/xhwang/.local/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 485, in check_dict_alignment
raise SyntaxError(string + CLI_HELP_MSG) from e
SyntaxError: 'separate_outputs' is not a valid YOLO argument.

Arguments received: ['yolo', 'export', 'model=yolov8n.pt', 'imgsz=640', 'format=tflite', 'int8', 'separate_outputs=True']. Ultralytics 'yolo' commands use the following syntax:

yolo TASK MODE ARGS

Where TASK (optional) is one of {'pose', 'detect', 'segment', 'obb', 'classify'}
MODE (required) is one of {'track', 'val', 'export', 'benchmark', 'train', 'predict'}
ARGS (optional) are any number of custom 'arg=value' pairs like 'imgsz=320' that override defaults.
See all ARGS at https://docs.ultralytics.com/usage/cfg or with 'yolo cfg'

1. Train a detection model for 10 epochs with an initial learning_rate of 0.01
yolo train data=coco8.yaml model=yolo11n.pt epochs=10 lr0=0.01

2. Predict a YouTube video using a pretrained segmentation model at image size 320:
yolo predict model=yolo11n-seg.pt source='https://youtu.be/LNwODJXcvt4' imgsz=320

3. Val a pretrained detection model at batch-size 1 and image size 640:
yolo val model=yolo11n.pt data=coco8.yaml batch=1 imgsz=640

4. Export a YOLO11n classification model to ONNX format at image size 224 by 128 (no TASK required)
yolo export model=yolo11n-cls.pt format=onnx imgsz=224,128

5. Streamlit real-time webcam inference GUI
yolo streamlit-predict

6. Ultralytics solutions usage
yolo solutions count or in ['heatmap', 'queue', 'speed', 'workout', 'analytics', 'trackzone'] source="path/to/video/file.mp4"

7. Run special commands:
yolo help
yolo checks
yolo version
yolo settings
yolo copy-cfg
yolo cfg
yolo solutions help

Docs: https://docs.ultralytics.com
Solutions: https://docs.ultralytics.com/solutions/
Community: https://community.ultralytics.com
GitHub: https://github.com/ultralytics/ultralytics

 

Thanks,

Joshua

 

0 Kudos
Reply

1,622 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hello,

The code has changed, you can refer below commit. I think yolo export model=yolov8n.pt imgsz=640 format=tflite int8 is enough.

Zhiming_Liu_0-1735007812185.png

 



Best Regards,
Zhiming

0 Kudos
Reply

1,594 Views
Joshua2
Contributor II

Hi Zhiming,

     Thank you very much for your help. The conversion issue has been resolved.


I have encountered a new problem now, IMX8MP inference is very very slow!

Example program running:
ultralytics_yolov8/examples/YOLOv8-ONNXRuntime-CPP
Model conversion:
yolo export model=yolov8n.pt imgsz=640 format=onnx int8
compile:
mkdir build && cd build; cmake -D AARCH=TRUE ..; make
result:

params.cudaEnable 0
[YOLO_V8(CUDA)]: Cuda warm-up cost 2205.53 ms.
start Detector
img_path ../bus.jpg
[YOLO_V8(CUDA)]: 96.488ms pre-process, 2129.51ms inference, 17.911ms post-process.
res 4
label person 0.87 0.870000
label person 0.86 0.860000
label bus 0.86 0.860000
label person 0.82 0.820000

How can I optimize it? How to use GPU or NPU acceleration?

 

Thanks,

Joshua

0 Kudos
Reply

1,582 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hello,

To appoint hardware accelerators , please refer 2.6.5 Using hardware accelerators in this guide.https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf

Best Regards,
Zhiming

0 Kudos
Reply

1,549 Views
Joshua2
Contributor II

Reference Documents

3.1 ONNX Runtime software stack

ONNX Runtime only supports CPU, which may be the reason for being too slow.


I tried using the TF model and referred to "ultralytics_yolov5/examples/YOLOv8OpenCV-int8-tflite Python",
1. Default interface:
interpreter = tflite.Interpreter(model_path=self.tflite_model)
##########Inference time: 1267.3 ms
2. Multi threaded optimization
eter = tflite.Interpreter(model_path=self.tflite_model, experimental_delegates=None, num_threads=4)
##########Inference time: 513.8 ms
3. How do I configure NPU and GPU inference?

Thanks,

Joshua

 

 

0 Kudos
Reply

1,468 Views
Zhiming_Liu
NXP TechSupport
NXP TechSupport

Hello,

Please appoint experimental_delegates="/usr/lib/libvx_delegate.so"

Best Regards,
Zhiming

0 Kudos
Reply