Hi,
I'm working with the i.MX 8M Plus EVK to test the ML accelerator integrated in the SoC.
To do that, I've created a Keras model with TensorFlow 1.15. Now I would like to run it on the GPU/ML module of the i.MX 8M Plus.
The eIQ user's guide (rev. L5.4.24_2.1.0) says:
"The GPU/ML module driver does not support per-channel quantization yet. Therefore post-training quantization of models with TensorFlow v2 cannot be used if the model is supposed to run on the GPU/ML module (inference on CPU does not have this limitation). TensorFlow v1 quantization-aware training and model conversion is recommended in this case".
It seems that Keras quantization-aware training only supports TensorFlow v2 and it is per-channel/per-axis for convolutional layers: https://www.tensorflow.org/model_optimization/guide/quantization/training#general_support_matrix.
Is there a way to do per-tensor quantization for a Keras model with convolutional layers and run it on the GPU/ML module? If not, how am I supposed to run my model on the GPU/ML module?