Quantized onnx models is supposedly supported on the NPU on imx95 as of the latest release (LF6.12.20_2.0.0). Running both int8 and int4 onnx versions of Gemma3-1B on CPU provides expected results, while the NPU produces nothing but nonsense. Running quantized tflite models on the NPU on imx95 board requires conversion by the neutron converter specifying imx95 as a target. Does quantized onnx models need a conversion step before being runable on the NPU? The converter does not seem to support onnx.
Hi @1o_o1
Currently ONNX LLMs is not supported on Nertron NPU now. We will support the ONNX Runtime for LLMs in Q3 BSP release. Then you just need to specify the Neutron provider in the ONNX runtime API to deploy the supported Ops in LLM on Neutron NPU.
Regards
Daniel
Okey, I see. A bit misleading to put it in the machine learning guide then. This means that no ONNX runtime models are supported on the NPU as of yet i would presume?