We're running a float32 model on the NPU of the imx8mp, via NNAPI. However, when the model is running it appears to put lots of load on the CPU, at least according to htop. htop shows all four cores at about 60% utilization.
I initially thought it might be data transfers or casts, but the data itself is very small (1200 floats) and already in the form the model expects. Plus the model only runs at about 10 Hz so even if the data was larger or needed casting or something, it should be fairly trivial work for the CPU.
We're running the model using onnxruntime in Python (for now):
data = np.random.rand(1, 6, 200).astype('float32')
ort_sess = ort.InferenceSession("model.onnx", providers=["NnapiExecutionProvider"])
for _ in range(1000):
outputs = ort_sess.run(None, {'input': data})
Can any give advice on why the CPU is so taxed, or what we can do to diagnose the cause?
(Note that I know this NPU is not well geared for float32 inference - we're also working on quantization but having problems there too.)
Is there a way to directly measure load on the NPU to confirm it is even running there?
Thanks
Hello Colin,
The.MX 8M Plus NPU doesn't support float 32.
Regards