Batch size and inference time on NPU

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Batch size and inference time on NPU

224 Views
kevin_allen
Contributor I

Hello there, 

I am using a board from the company PhyTec that is equipped with the I.MX 8M Plus chip and NPU.  I am using a TensorFlow Lite model with a ResNet50 architecture to perform classification. The model in quantized (8-bit unit) to run on the NPU. The inference time for a single image of shape 224x224x3 is approximately 23 ms. When I try to increase the batch size, the inference time is approximately equal to 23*batchSize. 

I wonder whether this is normal. On larger desktop GPUs, processing a small batch of images (e.g., 16) usually takes less time than the inference time for a single image multiplied by the batch size. On the NPU of the I.MX 8M Plus chip, I don't see such a gain. Should a faster inference time be expected from processing images with a batch size larger than one?

Any feedback will be appreciated.

0 Kudos
Reply
2 Replies

179 Views
kevin_allen
Contributor I

Thanks for the quick reply. 

I am not sure I fully understand your answer. So, if it takes 23 ms to process a single image (using a batch size of 1), you would expect that a batch of 6 images takes approximately 23 ms x 6  (or 138 ms) to process. Is this right?

0 Kudos
Reply

203 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello,

 

Yes it is expected but not on 23*batchsized is like 10*batchsize depend of the size.

Regards

0 Kudos
Reply