Exporting YOLO Models for NXP i.MX Platforms

是

In this post, we will review the YOLO model export process for three popular NXP families: i.MX8MP, i.MX93, and i.MX95. These processors are increasingly used in edge AI applications such as smart vision, industrial automation, robotics, and intelligent HMI systems. Although they all support machine learning deployment, the export path, supported runtimes, and hardware acceleration options may differ depending on the device.

The purpose of this guide is to provide a clearer starting point for developers who want to take a trained YOLO model and prepare it for execution on these i.MX platforms. Whether your workflow targets CPU, NPU.

YOLO Model Export Workflow for i.MX Processors

1) Install Ultralytics

Install or upgrade the Ultralytics package from PyPI:

pip install -U ultralytics

2) Export the YOLO Model (TFLite INT8)
Export your trained YOLO model to TensorFlow Lite (TFLite) format with INT8 quantization:

yolo export model=<your_model>.pt format=tflite int8=True

Notes:

The model must be exported in TFLite format and quantized to INT8.
At this stage:

The model can run on CPU for:

i.MX8MP
i.MX93
i.MX95

On i.MX8MP, this TFLite model can also be deployed to the NPU using the appropriate delegate.

3) i.MX93 Compile for Ethos-U NPU (Vela)

For i.MX93, an additional compilation step is required to use the Ethos-U NPU.
Run the Vela compiler to convert the TFLite model into an optimized format:

vela <model>.tflite --output-dir <output_folder>

Notes:

This step generates a model optimized for the Ethos-U NPU. The resulting output files are required for deployment using the NPU delegate on the i.MX93 platform.

Please ensure that the model complies with the Ethos-U operator constraints, as only supported operations can be accelerated by the NPU.

This command can be executed directly on the i.MX93 target, or alternatively by using the eIQ Toolkit (please refer to the eIQ Converter documentation for more details).

4) i.MX95 Convert Model Using Neutron SDK
For i.MX95, the model must be converted using the Neutron Converter, depending on the BSP version installed on your board.

.\neutron-converter.exe `
  --input "<model>.tflite" `
  --target imx95 `
  --output "<model_neutron>.tflite" `
  --optimization-level OOpt

Notes:

The Neutron toolchain prepares the model for i.MX95 NPU acceleration.
Supported formats and flags may vary depending on the Neutron SDK version.
Always verify compatibility with your BSP release.

You can check the compatibility details of the Neutron SDK in the "docs" folder of your downloaded Neutron SDK package.

5) Benchmark the Model
After exporting and converting the model, you can validate performance using benchmarking tools.
Typical options include:

TFLite benchmark tool (CPU / delegate):

benchmark_model --graph=<model>.tflite --num_threads=X

6) Results

iMX8MP

CPU

root@imx8mpevk:~# /usr/bin/tensorflow-lite-2.19.0/examples/benchmark_model --graph=yolov8n_full_integer_quant.tflite --mum_threads=4
INFO: STARTING!
WARN: Unconsumed cmdline flags: --mum_threads=4
INFO: Log parameter values verbosely: [0]
INFO: Graph: [yolov8n_full_integer_quant.tflite]
INFO: Signature to run: []
INFO: Loaded model yolov8n_full_integer_quant.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 3.42652
INFO: Initialized session in 86.368ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=1 curr=1029584 p5=1029584 median=1029584 p95=1029584

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=50 first=986237 curr=985536 min=983921 max=993982 avg=985863 std=1497 p5=984152 median=985947 p95=986715

INFO: Inference timings in us: Init: 86368, First inference: 1029584, Warmup (avg): 1.02958e+06, Inference (avg): 985863
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=11.207 overall=40.918
root@imx8mpevk:~#

NPU

root@imx8mpevk:~# /usr/bin/tensorflow-lite-2.19.0/examples/benchmark_model --graph=yolov8n_full_integer_quant.tflite --num_threads=4 --external_delegate_path=/usr/lib/libvx_delegate.so
INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Num threads: [4]
INFO: Graph: [yolov8n_full_integer_quant.tflite]
INFO: Signature to run: []
INFO: #threads used for CPU inference: [4]
INFO: #threads used for CPU inference: [4]
INFO: External delegate path: [/usr/lib/libvx_delegate.so]
INFO: Loaded model yolov8n_full_integer_quant.tflite
INFO: Vx delegate: allowed_cache_mode set to 0.
INFO: Vx delegate: device num set to 0.
INFO: Vx delegate: allowed_builtin_code set to 0.
INFO: Vx delegate: error_during_init set to 0.
INFO: Vx delegate: error_during_prepare set to 0.
INFO: Vx delegate: error_during_invoke set to 0.
INFO: EXTERNAL delegate created.
INFO: Explicitly applied EXTERNAL delegate, and the model graph will be completely executed by the delegate.
INFO: The input model file size (MB): 3.42652
INFO: Initialized session in 39.515ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.

INFO: count=1 curr=16831746 p5=16831746 median=16831746 p95=16831746

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=50 first=67167 curr=67190 min=67048 max=67366 avg=67187 std=64 p5=67094 median=67184 p95=67295

INFO: Inference timings in us: Init: 39515, First inference: 16831746, Warmup (avg): 1.68317e+07, Inference (avg): 67187
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=9.47266 overall=224.398
root@imx8mpevk:~#

iMX93

CPU

root@imx93evk:~# /usr/bin/tensorflow-lite-2.19.0/examples/benchmark_model --graph=yolov8n_full_integer_quant.tflite --num_threads=2
INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Num threads: [2]
INFO: Graph: [yolov8n_full_integer_quant.tflite]
INFO: Signature to run: []
INFO: #threads used for CPU inference: [2]
INFO: #threads used for CPU inference: [2]
INFO: Loaded model yolov8n_full_integer_quant.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 3.42652
INFO: Initialized session in 57.963ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=3 first=247896 curr=198973 min=198973 max=247896 avg=215381 std=22991 p5=198973 median=199275 p95=247896

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=50 first=199533 curr=198880 min=197719 max=205262 avg=199032 std=1005 p5=198344 median=198886 p95=199961

INFO: Inference timings in us: Init: 57963, First inference: 247896, Warmup (avg): 215381, Inference (avg): 199032
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=11.2539 overall=40.9961
root@imx93evk:~#

NPU

root@imx93evk:~# /usr/bin/tensorflow-lite-2.19.0/examples/benchmark_model --graph=yolov8n_full_integer_quant_vela.tflite --num_threads=2 --external_delegate_path=/usr/lib/libethosu_delegate.so
INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Num threads: [2]
INFO: Graph: [yolov8n_full_integer_quant_vela.tflite]
INFO: Signature to run: []
INFO: #threads used for CPU inference: [2]
INFO: #threads used for CPU inference: [2]
INFO: External delegate path: [/usr/lib/libethosu_delegate.so]
INFO: Loaded model yolov8n_full_integer_quant_vela.tflite
INFO: Ethosu delegate: device_name set to /dev/ethosu0.
INFO: Ethosu delegate: cache_file_path set to .
INFO: Ethosu delegate: timeout set to 60000000000.
INFO: Ethosu delegate: enable_cycle_counter set to 0.
INFO: Ethosu delegate: enable_profiling set to 0.
INFO: Ethosu delegate: profiling_buffer_size set to 2048.
INFO: Ethosu delegate: pmu_event0 set to 0.
INFO: Ethosu delegate: pmu_event1 set to 0.
INFO: Ethosu delegate: pmu_event2 set to 0.
INFO: Ethosu delegate: pmu_event3 set to 0.
INFO: EXTERNAL delegate created.
INFO: EthosuDelegate: 8 nodes delegated out of 15 nodes with 8 partitions.
INFO: Explicitly applied EXTERNAL delegate, and the model graph will be partially executed by the delegate w/ 8 delegate kernels.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 2.9511
INFO: Initialized session in 638.148ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=7 first=87215 curr=81264 min=81079 max=87215 avg=82056.4 std=2107 p5=81079 median=81187 p95=87215

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=50 first=81497 curr=81232 min=80887 max=81783 avg=81153.1 std=178 p5=80921 median=81148 p95=81497

INFO: Inference timings in us: Init: 638148, First inference: 87215, Warmup (avg): 82056.4, Inference (avg): 81153.1
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=7.36328 overall=8.73828
root@imx93evk:~#

iMX95

CPU

root@imx95evk:~# /usr/bin/tensorflow-lite-2.19.0/examples/benchmark_model --graph=yolov8n_full_integer_quant.tflite --num_threads=6
INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Num threads: [6]
INFO: Graph: [yolov8n_full_integer_quant.tflite]
INFO: Signature to run: []
INFO: #threads used for CPU inference: [6]
INFO: #threads used for CPU inference: [6]
INFO: Loaded model yolov8n_full_integer_quant.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 3.42652
INFO: Initialized session in 35.268ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=7 first=115073 curr=74468 min=74170 max=115073 avg=80310.4 std=14192 p5=74170 median=74581 p95=115073

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=50 first=74143 curr=74135 min=73657 max=76392 avg=74346.9 std=447 p5=73829 median=74307 p95=75020

INFO: Inference timings in us: Init: 35268, First inference: 115073, Warmup (avg): 80310.4, Inference (avg): 74346.9
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=11.5195 overall=40.8867
root@imx95evk:~#

NPU:

root@imx95evk:~# /usr/bin/tensorflow-lite-2.19.0/examples/benchmark_model --graph=yolov8n_full_integer_quant_neutron.tflite --num_threads=6 --external_delegate_path=/usr/lib/libneutron_delegate.so
INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Num threads: [6]
INFO: Graph: [yolov8n_full_integer_quant_neutron.tflite]
INFO: Signature to run: []
INFO: #threads used for CPU inference: [6]
INFO: #threads used for CPU inference: [6]
INFO: External delegate path: [/usr/lib/libneutron_delegate.so]
INFO: Loaded model yolov8n_full_integer_quant_neutron.tflite
INFO: EXTERNAL delegate created.
INFO: NeutronDelegate delegate: 1 nodes delegated out of 33 nodes with 1 partitions.

INFO: Neutron delegate version: v1.0.0-7399a58e, zerocp enabled.
INFO: Explicitly applied EXTERNAL delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 3.20989
INFO: Initialized session in 12.756ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=17 first=31509 curr=27588 min=27555 max=31509 avg=29101.2 std=1166 p5=27555 median=29071 p95=31509

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=50 first=28068 curr=29081 min=26573 max=31340 avg=29104.1 std=1204 p5=27306 median=29141 p95=31171

INFO: Inference timings in us: Init: 12756, First inference: 31509, Warmup (avg): 29101.2, Inference (avg): 29104.1
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=6.98438 overall=12.2344
root@imx95evk:~

Disclaimer:

Ultralytics YOLO models have not been officially validated/supported by NXP. Therefore, compatibility with i.MX processors and their corresponding NPUs cannot be guaranteed. Some models or configurations may not work as expected depending on operator support and hardware limitations.