Hello
I also need to use NEON to accelerate my c++ program on iMX8M mini. In my case, it's only need to include the noen header by
#include <arm_neon.h>
and compile it without -mfpu=neon, then the program works!
I use vmulq_f32 and vmlaq_f32 in the program, and the speed indeed runs faster, the answer is also correct.
Maybe this will help you.