Hi Edwin,
We have successfully isolated the issue and can confirm it is a mathematical corruption bug within the CMSIS-NN optimized kernels (specifically affecting dilated convolutions, which our ASR model relies on heavily).
Our Proof:
1. We recompiled "libtflm.a" using purely standard C++ Reference Kernels (omitting the "cmsis_nn" directory during compilation) while keeping our SDRAM cache active and using safe DTCM buffers for DMA transfers.
2. With CMSIS-NN bypassed, the model immediately began successfully decoding correct speech tokens. For example, Chunk 7 correctly outputs "[ easy]", and Chunk 10 correctly outputs "[ ch]".
3. When we revert back to the precompiled CMSIS-NN library, the execution speed drops to 537 ms, but the output is completely corrupted back to empty brackets across all chunks.
This confirms our memory setup, cache configuration, and DMA buffers are completely correct, and that a mathematical error is occurring within the CMSIS-NN optimized convolution/depthwise convolution paths when processing non-unity dilation (dilation rate greater than 1).
Our Questions:
- Which version of the eIQ SDK / CMSIS-NN middleware contains the official bugfix for dilated convolutions (dilation rate greater than 1) in arm_convolve_s8 and arm_depthwise_conv_s8?
- Can you provide us with a patch or updated libtflm.a that has corrected CMSIS-NN kernels so we can achieve both full accuracy and the optimized 537 ms inference speed?
Regards,
Priyesh shahi