Is there optimized number of channels for int8 Machine Learning model on NXP board?

nnxxpp

I have int8 model and I run it on MIMXRT1060-EVKB.

I pruned this model, it means that some channels weights of layers are removed. The model size is reduced. But the inference time is NOT reduced, it is increased.

Usually, with machine learning model, we have the number of channels for each layer: 8, 16, 32... But after pruning, the number of channels for each layer is changed, such as 5, 11, 27...

Is there optimized number of channels for int8 Machine Learning model on NXP board? I asked this question, because I encoutered this problem with Nvidia Jetson board (we need to make sure that the number of channels is divisibly by 8, 16, something like that. I don't know whether there is this issue with NXP board.

Thank you.

Sam_Gao

Yes, exactly. it is general recommendation (divisible by 4) for the processors(cortex M7 or below) , RT1170 processor is combinted with cm4 and cm7.

View solution in original post

Sam_Gao

Hi @nnxxpp

The inference time is not directly related to the number of channels, and is not only related to the amount of model calculations.

You need to compare the changes in the computational graph of the pruned model with the computational graph of the model before pruning.

Certainly, if the topology does not change, but the number of channels has decreased, the inference time will not be longer. NXP eIQ Model Tool can view the computational graph corresponding to the model in tflite format.

https://www.nxp.com/design/design-center/software/eiq-ml-development-environment/eiq-toolkit-for-end...

Best Regrads

Sam

nnxxpp

@Sam_Gao

Thank you for your repsonse.

"The inference time is not directly related to the number of channels, and is not only related to the amount of model calculations." As my understanding, NXP does not have any recommendation about number of channels of Convolution layers for PTQ int8 model. Is that right?

"You need to compare the changes in the computational graph of the pruned model with the computational graph of the model before pruning." Yes, I checked model architecture. It is not changed after pruning. Only the number of channels is changed. You can see in attached images (it is a part of the whole model). For example, after pruning the number of channels of the 1-st Convolution layer changes from 32 to 25. The number of channels of the 2-nd Convolution layer changes from 64 to 61. But inference time of pruned model is larger than the inference time of the original model.

I have experenced many cases and this phenomenon was happen.

"Certainly, if the topology does not change, but the number of channels has decreased, the inference time will not be longer." From theory, it is true. But from practice, it is not always true. It may be related to computational process of hardware.

Sam_Gao

Hi @nnxxpp

I mistook it for other chips with NPU. It is sure that RT1170 does not have NPU (e.g 4 compuatation pipelines inside), please be sure the channel number should be divisible by 4 on RT1170 (Cortex M7/M4), but 25 or 61 is not.

I will send you mail, if you still have issue, please give me a sample model that does not involve privacy by mail, and I will try to reproduce it.

B.R.

Sam

nnxxpp

@Sam_Gao

Thank you so much. It is very good news. This is very helpful when deploying quantized model in board. Please confirm this point.

"please be sure the channel number should be divisible by 4 on RT1170 (Cortex M7/M4), but 25 or 61 is not.". My board is MIMXRT1060 (Cortex M7). It means that I need to make sure the channel number should be divisible by 4 same as on RT1170. It that right?

Sam_Gao

Yes, exactly. it is general recommendation (divisible by 4) for the processors(cortex M7 or below) , RT1170 processor is combinted with cm4 and cm7.

nnxxpp

Sure, thank you so much for supporting me.

nnxxpp

@Sam_Gao

Do you have any progress?

Sam_Gao

Hi @nnxxpp

Would you please clarify the details below to help me understand more?

Which SDK example and version? how to reproduce it? how to qualify it?

Thanks and Best Regards.

nnxxpp

@Sam_Gao

Sorry for lack of information.

I use tflm_cifar10 as the base project in MCUXpresso SDK (latest) for MCUXPress IDE. My board is MIMXRT1060-EVKB.

Sorry, I can not share my Machine Learning models. I give you example to better understand.

For example, my original model have 4 layers Convolution layer (128 channels) ==> Convolution layer (256 channels) ==> Convolution layer (512 channels) ==> Dense layer (10 classes of Cifar10). After that we appy pruning to the model and after pruning, we get a model with the acchitecture:

Convolution layer (128 channels) ==> Convolution layer (231 channels) ==> Convolution layer (455 channels) ==> Dense layer (10 classes of Cifar10).

You can see the number of channels is changed for the second and third Convolution layers: 256 => 231 and 512 => 455.

Model size is reduced, but the inference time of pruned model is large than the one of original model (without pruning). Both models is INT8 quantized model. Is there optimized number of channels for int8 Machine Learning model on NXP board?

I hope that it can help you.