Hello, I ll try to convert certain networks to the NPU with the best possible performance.
Therefore I saw in the NXP IMX.8 ML Guide that the NPU provides faster inference for "per-tensor" quanized models.
I saw in some example (ex. Posenet) that all the conv. layers have been quantized Layerwise ,not Channel-wise as a Tensorflow Default.
I have yet not been able to quantize the convolution in a simple example in a Layer wise manner. Do you might have an example how to achieve this?
Thanks Daniel