I'm trying to run an onnx model on the NPU, and I think it's working, but it's very slow. I suspect I might be accidentally repeating the warmup time, but I can't see how to prevent it. The code I'm using is here:
It is based on the code linked in the Machine Learning User's guide, with very few changes.
I have added the line
as recommended by the Machine Learning User's guide, at the location suggested in the code comments. I have added a loop to do inference ten times. The loop starts at line 140 in my example, at the time of writing. It does ten iterations, and ends up taking just under five seconds total. The same code without the line specifying the execution provider takes less than one second.
As far as I can tell the loop only contains the data preparation and inference steps, so I don't understand why it would be so slow.
Any help, much appreciated.
Any chance anyone has any ideas here? We're coming back to this chip after a long hiatus and this is the one problem we need to solve to move forward with it.
As stated I suspect it's a warmup time problem, but I don't know how to get around it. I wasn't able to find an example that dealt with onnx models specifically.
I have read it and it has not answered the question. Can you actually look at the code and make a suggestion based on that?
To be clear, we have one model and we want to use it for repeated/continuous inference on different data. Surely it can't be the case that we have to wait out the warmup every time we provide new data? That would make this device next to useless.
I've also changed the loop to do 100 inferences since the application note says it can take a few spins to overcome the warmup time. But there is no change in rate up to 100 inferences, they still take about half a second each.