i.MX 8M Plus NPU much slower than CPU; warmup time?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

i.MX 8M Plus NPU much slower than CPU; warmup time?

1,035 Views
colinbroderick
Contributor III

I'm trying to run an onnx model on the NPU, and I think it's working, but it's very slow. I suspect I might be accidentally repeating the warmup time, but I can't see how to prevent it. The code I'm using is here:

https://github.com/cbroderickgreeve/onnx_example/blob/master/onnx_test_npu.cpp 

It is based on the code linked in the Machine Learning User's guide, with very few changes.

I have added the line

 

OrtSessionOptionsAppendExecutionProvider_VsiNpu(session_options, 0);

 

as recommended by the Machine Learning User's guide, at the location suggested in the code comments. I have added a loop to do inference ten times. The loop starts at line 140 in my example, at the time of writing. It does ten iterations, and ends up taking just under five seconds total. The same code without the line specifying the execution provider takes less than one second.

As far as I can tell the loop only contains the data preparation and inference steps, so I don't understand why it would be so slow.

Any help, much appreciated.

0 Kudos
4 Replies

806 Views
colinbroderick
Contributor III

Any chance anyone has any ideas here? We're coming back to this chip after a long hiatus and this is the one problem we need to solve to move forward with it.

As stated I suspect it's a warmup time problem, but I don't know how to get around it. I wasn't able to find an example that dealt with onnx models specifically.

0 Kudos

1,000 Views
colinbroderick
Contributor III

Can anybody provide any further insight on this, besides what is in the warmup time application note?

Thanks

0 Kudos

1,026 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello colinbroderick,

 

I think the warmup time application note AN12964 will help you

https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-proces...

 

Regards

0 Kudos

1,023 Views
colinbroderick
Contributor III

I have read it and it has not answered the question. Can you actually look at the code and make a suggestion based on that?

To be clear, we have one model and we want to use it for repeated/continuous inference on different data. Surely it can't be the case that we have to wait out the warmup every time we provide new data? That would make this device next to useless.

I've also changed the loop to do 100 inferences since the application note says it can take a few spins to overcome the warmup time. But there is no change in rate up to 100 inferences, they still take about half a second each.

0 Kudos