Hi ,
HW: imx8mp-evk.
SW: LF_v5.10.72-2.2.0_images_IMX8MPEVK
PC: ubuntu20.04
Reference document: i.MX_Machine_Learning_User's_Guide.pdf
We are deploying YOLO8 on IMX8MP, but we are encountering issues.
URL: https://github.com/NXP/eiq-model-zoo.git
branch: main
commit: 58c2b002e9f64f39b8c43e896e00446298544a33
We refer to the README.md of eiq-model-zoo/tasks/visit/object-detection/yolov8
1) The script 'bash recipe. sh' was not found.
2)‘’yolo export model=yolov8n.pt imgsz=640 format=tflite int8 separate_outputs=True”, an error was reported.
///
imx8mp@E480:~/git/imx8mp/cyberbee/NFS/gst/ultralytics_yolov8$ yolo export model=yolov8n.pt imgsz=640 format=tflite int8 separate_outputs=True
Traceback (most recent call last):
File "/usr/local/bin/yolo", line 8, in <module>
sys.exit(entrypoint())
File "/home/xhwang/.local/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 903, in entrypoint
check_dict_alignment(full_args_dict, overrides)
File "/home/xhwang/.local/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 485, in check_dict_alignment
raise SyntaxError(string + CLI_HELP_MSG) from e
SyntaxError: 'separate_outputs' is not a valid YOLO argument.
Arguments received: ['yolo', 'export', 'model=yolov8n.pt', 'imgsz=640', 'format=tflite', 'int8', 'separate_outputs=True']. Ultralytics 'yolo' commands use the following syntax:
yolo TASK MODE ARGS
''''''''''''''''''''''''
Thanks,
Joshua
Hi Zhiming,
The problem with YOLO's reasoning is still unresolved and we need your help.
IMAGE: LF_v6.6.52-2.2.0_images_IMX8MPEVK
I tried using NPU inference, but the speed was very slow. The CPU takes about 50ms, but the NPU requires 3500ms. Is there a problem with the configuration?
NPU:
python3 main.py --model yolov8n_full_integer_quant.tflite --img image.jpg --conf-thres 0.5 --iou-thres 0.5
INFO: Vx delegate: allowed_cache_mode set to 0.
INFO: Vx delegate: device num set to 0.
INFO: Vx delegate: allowed_builtin_code set to 0.
INFO: Vx delegate: error_during_init set to 0.
INFO: Vx delegate: error_during_prepare set to 0.
INFO: Vx delegate: error_during_invoke set to 0.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
##########Inference time: 3446.0 ms
img_width 256 img_height 256
[[[ 2.6509 15.906 15.906 ... 145.8 178.94 243.89]
[ 7.9528 11.929 11.929 ... 198.82 185.57 189.54]
[ 6.6274 33.137 35.788 ... 214.73 145.8 59.646]
...
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]]]
[32.7551794052124, 240.449116230011, 771.6738510131836, 469.714515209198] 0.8750193 5
[57.9184627532959, 394.2247134447098, 162.1633529663086, 508.8573968410492] 0.7973549 0
[675.8167366683483, 455.7349169254303, 134.20414835214615, 419.387948513031] 0.7507562 0
[222.8777128458023, 402.6124691963196, 123.0204713344574, 447.3471450805664] 0.6316708 0
CPU:
python3 main.py --model yolov8n_full_integer_quant.tflite --img image.jpg --conf-thres 0.5 --iou-thres 0.5
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
##########Inference time: 48.3 ms
img_width 256 img_height 256
[[[ 2.6509 15.906 15.906 ... 143.15 180.26 245.21]
[ 6.6274 11.929 11.929 ... 193.52 185.57 189.54]
[ 6.6274 33.137 35.788 ... 212.08 132.55 58.321]
...
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]
[ 0 0 0 ... 0 0 0]]]
[32.7551794052124, 240.449116230011, 771.6738510131836, 469.714515209198] 0.8750193 5
[57.9184627532959, 394.2247134447098, 162.1633529663086, 508.8573968410492] 0.7973549 0
[678.6126579344273, 455.7349169254303, 128.61230581998825, 419.387948513031] 0.7507562 0
[225.67363411188126, 402.6124691963196, 117.4286288022995, 447.3471450805664] 0.6316708 0
Thanks,
Joshua
Hello,
Please download from here https://github.com/DeGirum/ultralytics_yolov8
yolo export model=yolov8n.pt imgsz=640 format=tflite int8 separate_outputs=True
yolov8n_full_integer_quant.tflite
located in the yolov8n_saved_model
directory.
Best Regards,
Zhiming
Hi Zhiming,
Thank you for your reply!
I am using ultralytics_yolov8.
https://github.com/DeGirum/ultralytics_yolov8
branch:master
commit 75cab2e0c68723d4344c69a3bcd85265a582ab3d
hwang@E480:~/git/imx8mp/cyberbee/NFS/gst/ultralytics_yolov8$ yolo export model=yolov8n.pt imgsz=640 format=tflite int8 separate_outputs=True
Traceback (most recent call last):
File "/usr/local/bin/yolo", line 8, in <module>
sys.exit(entrypoint())
File "/home/xhwang/.local/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 903, in entrypoint
check_dict_alignment(full_args_dict, overrides)
File "/home/xhwang/.local/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 485, in check_dict_alignment
raise SyntaxError(string + CLI_HELP_MSG) from e
SyntaxError: 'separate_outputs' is not a valid YOLO argument.
Arguments received: ['yolo', 'export', 'model=yolov8n.pt', 'imgsz=640', 'format=tflite', 'int8', 'separate_outputs=True']. Ultralytics 'yolo' commands use the following syntax:
yolo TASK MODE ARGS
Where TASK (optional) is one of {'pose', 'detect', 'segment', 'obb', 'classify'}
MODE (required) is one of {'track', 'val', 'export', 'benchmark', 'train', 'predict'}
ARGS (optional) are any number of custom 'arg=value' pairs like 'imgsz=320' that override defaults.
See all ARGS at https://docs.ultralytics.com/usage/cfg or with 'yolo cfg'
1. Train a detection model for 10 epochs with an initial learning_rate of 0.01
yolo train data=coco8.yaml model=yolo11n.pt epochs=10 lr0=0.01
2. Predict a YouTube video using a pretrained segmentation model at image size 320:
yolo predict model=yolo11n-seg.pt source='https://youtu.be/LNwODJXcvt4' imgsz=320
3. Val a pretrained detection model at batch-size 1 and image size 640:
yolo val model=yolo11n.pt data=coco8.yaml batch=1 imgsz=640
4. Export a YOLO11n classification model to ONNX format at image size 224 by 128 (no TASK required)
yolo export model=yolo11n-cls.pt format=onnx imgsz=224,128
5. Streamlit real-time webcam inference GUI
yolo streamlit-predict
6. Ultralytics solutions usage
yolo solutions count or in ['heatmap', 'queue', 'speed', 'workout', 'analytics', 'trackzone'] source="path/to/video/file.mp4"
7. Run special commands:
yolo help
yolo checks
yolo version
yolo settings
yolo copy-cfg
yolo cfg
yolo solutions help
Docs: https://docs.ultralytics.com
Solutions: https://docs.ultralytics.com/solutions/
Community: https://community.ultralytics.com
GitHub: https://github.com/ultralytics/ultralytics
Thanks,
Joshua
Hello,
The code has changed, you can refer below commit. I think yolo export model=yolov8n.pt imgsz=640 format=tflite int8 is enough.
Best Regards,
Zhiming
Hi Zhiming,
Thank you very much for your help. The conversion issue has been resolved.
I have encountered a new problem now, IMX8MP inference is very very slow!
Example program running:
ultralytics_yolov8/examples/YOLOv8-ONNXRuntime-CPP
Model conversion:
yolo export model=yolov8n.pt imgsz=640 format=onnx int8
compile:
mkdir build && cd build; cmake -D AARCH=TRUE ..; make
result:
params.cudaEnable 0
[YOLO_V8(CUDA)]: Cuda warm-up cost 2205.53 ms.
start Detector
img_path ../bus.jpg
[YOLO_V8(CUDA)]: 96.488ms pre-process, 2129.51ms inference, 17.911ms post-process.
res 4
label person 0.87 0.870000
label person 0.86 0.860000
label bus 0.86 0.860000
label person 0.82 0.820000
How can I optimize it? How to use GPU or NPU acceleration?
Thanks,
Joshua
Hello,
To appoint hardware accelerators , please refer 2.6.5 Using hardware accelerators in this guide.https://www.nxp.com/docs/en/user-guide/IMX-MACHINE-LEARNING-UG.pdf
Best Regards,
Zhiming
Reference Documents
3.1 ONNX Runtime software stack
ONNX Runtime only supports CPU, which may be the reason for being too slow.
I tried using the TF model and referred to "ultralytics_yolov5/examples/YOLOv8OpenCV-int8-tflite Python",
1. Default interface:
interpreter = tflite.Interpreter(model_path=self.tflite_model)
##########Inference time: 1267.3 ms
2. Multi threaded optimization
eter = tflite.Interpreter(model_path=self.tflite_model, experimental_delegates=None, num_threads=4)
##########Inference time: 513.8 ms
3. How do I configure NPU and GPU inference?
Thanks,
Joshua
Hello,
Please appoint experimental_delegates="/usr/lib/libvx_delegate.so"
Best Regards,
Zhiming