Batch size and inference time on NPU

kevin_allen · ‎09-02-2024

Hello there,

I am using a board from the company PhyTec that is equipped with the I.MX 8M Plus chip and NPU. I am using a TensorFlow Lite model with a ResNet50 architecture to perform classification. The model in quantized (8-bit unit) to run on the NPU. The inference time for a single image of shape 224x224x3 is approximately 23 ms. When I try to increase the batch size, the inference time is approximately equal to 23*batchSize.

I wonder whether this is normal. On larger desktop GPUs, processing a small batch of images (e.g., 16) usually takes less time than the inference time for a single image multiplied by the batch size. On the NPU of the I.MX 8M Plus chip, I don't see such a gain. Should a faster inference time be expected from processing images with a batch size larger than one?

Any feedback will be appreciated.

kevin_allen · ‎09-03-2024

Thanks for the quick reply.

I am not sure I fully understand your answer. So, if it takes 23 ms to process a single image (using a batch size of 1), you would expect that a batch of 6 images takes approximately 23 ms x 6 (or 138 ms) to process. Is this right?

Bio_TICFSL · ‎09-02-2024

Hello,

Yes it is expected but not on 23*batchsized is like 10*batchsize depend of the size.

Regards