i.MX 8M Plus NPU much slower than CPU; warmup time?

cancel
Showing results for 
Search instead for 
Did you mean: 

i.MX 8M Plus NPU much slower than CPU; warmup time?

148 Views
colinbroderick
Contributor III

I'm trying to run an onnx model on the NPU, and I think it's working, but it's very slow. I suspect I might be accidentally repeating the warmup time, but I can't see how to prevent it. The code I'm using is here:

https://github.com/cbroderickgreeve/onnx_example/blob/master/onnx_test_npu.cpp 

It is based on the code linked in the Machine Learning User's guide, with very few changes.

I have added the line

 

OrtSessionOptionsAppendExecutionProvider_VsiNpu(session_options, 0);

 

as recommended by the Machine Learning User's guide, at the location suggested in the code comments. I have added a loop to do inference ten times. The loop starts at line 140 in my example, at the time of writing. It does ten iterations, and ends up taking just under five seconds total. The same code without the line specifying the execution provider takes less than one second.

As far as I can tell the loop only contains the data preparation and inference steps, so I don't understand why it would be so slow.

Any help, much appreciated.

0 Kudos
3 Replies

113 Views
colinbroderick
Contributor III

Can anybody provide any further insight on this, besides what is in the warmup time application note?

Thanks

0 Kudos

139 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello colinbroderick,

 

I think the warmup time application note AN12964 will help you

https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-proces...

 

Regards

0 Kudos

136 Views
colinbroderick
Contributor III

I have read it and it has not answered the question. Can you actually look at the code and make a suggestion based on that?

To be clear, we have one model and we want to use it for repeated/continuous inference on different data. Surely it can't be the case that we have to wait out the warmup every time we provide new data? That would make this device next to useless.

I've also changed the loop to do 100 inferences since the application note says it can take a few spins to overcome the warmup time. But there is no change in rate up to 100 inferences, they still take about half a second each.

0 Kudos