From my recollection the eIQ tools are exclusively for image recognition workloads, so unfortunately are of no use to us. Please let me know if I have misunderstood that.
I'm afraid I can't share the model itself, but it is a fairly standard residual convolutional network with nothing particularly exotic going on. The code we used to quantize is as follows.
import sys
from onnxruntime.quantization import QuantType, quantize_dynamic
model_in = sys.argv[1]
model_out = sys.argv[2]
model_quant_dynamic = quantize_dynamic(
model_in,
model_out,
optimize_model=False,
weight_type=QuantType.QUInt8
)
We have been trying static quantization today using broadly similar code with the addition of the calibration data reader. But have come up against different issue. We are working on two methods in tandem: a Python script and a C++ programme. After static quantization:
Python:
* We cannot get the model to run at all on the NPU using the Python onnxruntime module: Error states "The graph is not acyclic".
* It runs well on the CPU, taking about 8 ms per inference. This is fast enough that (pending accuracy analysis) we might even consider abandoning NPU inference altogether, given the extreme difficulty in getting good results.
C++
* When running in C++ instead it works on both NPU and CPU, but is extraordinarily slow for reasons we haven't been able to discern. Running on NPU takes over 1000 ms! And on the CPU about 100 ms.
I can share the code if you like but it is arguably a different problem so perhaps a new thread would be better. Please let me know.