yolov11n,yolov8n,yolov5nu model not getting any output after running on i.MX95 NPU [i.MX95 NPU] YOLOv5n/v8n/v11n Neutron-converted Models Run but Return No Detections (Zero Output) Issue Description I am evaluating YOLO object detection models on the i.MX95 NPU using the Neutron converter. While the INT8 quantized TFLite models run successfully and detect objects on the Cortex-A55 CPU, the compiled neutron.tflite versions yield zero detections (empty/no output) when offloaded to the NPU, despite executing inference without crashing. Environment & Hardware Setup Hardware: i.MX95 19x19 LPDDR5 EVK (A1 Revision) OS/Kernel: Linux 6.12.34-lts-next-gbe78e49cb433 #1 SMP PREEMPT (aarch64) NXP Toolchain: MCU-SDK v25.09.00 + Linux 6.12.34_2.1.0 Models Tested: YOLOv5nu, YOLOv8n, YOLOv11n (Ultralytics) Workflow Steps & Commands Used 1. Quantization (Ultralytics Export) Models were exported to INT8 full integer quantization with a 320x320 resolution: yolo export model=yolov8n.pt format=tflite int8=True imgsz=320# (Repeated identically for yolov11n.pt and yolov5nu.pt)
Status: Works perfectly on CPU. yolovXn_full_integer_quant.tflite detects objects correctly on the A55 cores. 2. Neutron Compilation The TFLite models were compiled for the i.MX95 NPU using the Neutron converter from MCU_SDK_25.09.00+Linux_6.12.34_2.1.0: ./neutron-converter --input yolov8n_full_integer_quant.tflite --target imx95 --output yolov8n_full_integer_quant_neutron.tflite Status: Fails to detect objects on NPU. The compiled model loads and runs inference without throwing syntax or execution errors, but output tensors return zero detections for the exact same test images. Observed Symptoms & Suspected Root Causes Operator Fallbacks: Did the converter fall back to CPU for specific YOLO layers (like custom Anchors, SiLU/Swin activations, or Non-Max Suppression)? Quantization Scaling/Asymmetry: YOLO models exported via Ultralytics often use asymmetric quantization or have specific output tensor scaling that the Neutron NPU driver might misinterpret. Output Tensor Formatting: The inference runs, which suggests the input pipeline is fine, but the output bounding boxes/scores are either blank or completely garbage values. Questions for NXP Experts Are there known limitations or mandatory optimization flags needed in the neutron-converter specifically for Ultralytics YOLO architectures? Should the NMS (Non-Max Suppression) layer be stripped out before passing the TFLite model to the Neutron converter? Does the i.MX95 Neutron SDK require symmetric quantization (per_channel=True or False) to parse the output layers properly? Any guidance, reference scripts, or working YOLO deployment notes for the i.MX95 NPU would be highly appreciated. Re: yolov11n,yolov8n,yolov5nu model not getting any output after running on i.MX95 NPU Hi Alejandro, Thank you for your response. I believe there may be a misunderstanding regarding my hardware platform. My issue is not related to the i.MX91. I am using the following platform: Board: i.MX95 19x19 LPDDR5 EVK (IMX95LPD5EVK-19CM, A1 Revision) Board Quick Start Guide: https://www.nxp.com/docs/en/quick-reference-guide/IMX95LPD5EVK-19CM.pdf Neutron SDK: MCU_SDK_25.09.00+Linux_6.12.34_2.1.0 Kernel: Linux 6.12.34-lts-next-gbe78e49cb433 #1 SMP PREEMPT (aarch64) The guide you shared appears to be for the i.MX91, whereas my question is specifically about YOLO deployment on the i.MX95 Neutron NPU. The original INT8 TFLite models (YOLOv5nu, YOLOv8n, and YOLOv11n) run correctly on the Cortex-A55 CPU and produce valid detections. However, after compiling the same models using the Neutron converter included in MCU_SDK_25.09.00+Linux_6.12.34_2.1.0, inference executes successfully on the NPU without any runtime errors, but the output tensors contain no valid detections. For easier investigation, I have already attached the following files to my original post: * Original INT8 quantized TFLite models. * Neutron-converted TFLite models for YOLOv8n and YOLOv11n. * A Python inference script that can be used to reproduce the issue. Since these are the original pretrained Ultralytics models converted to TFLite, you can use the standard COCO class names directly with the provided script. It should allow you to reproduce the behavior on your i.MX95 platform without requiring any additional modifications. I would appreciate it if you could reproduce the issue using the attached files and let me know whether this is a known limitation or issue with the current Neutron SDK for the i.MX95. Thank you. Re: yolov11n,yolov8n,yolov5nu model not getting any output after running on i.MX95 NPU Hi @vijayranaACL, Thank you for contacting NXP Support. Please refer to this guide. Since you are using the i.MX91 A1 silicon revision, it is possible that some features or functionality may not operate correctly, as A1 is an early silicon revision intended primarily for evaluation and development purposes. For this reason, we recommend using the i.MX91 B0 silicon revision for your testing and validation activities. The guide was developed and validated using the B0 silicon version, so the documented behavior and results are based on that revision. If possible, please confirm which silicon revision you are using and whether you have access to a B0 device for comparison. Best regards, Alejandro Garcia
View full article