Was in need of Q31 in-place IFFT for a Kinetis K20 project with very limited flash and RAM.
While fast implementations I tried were too resource-hungry, low-footprint implementations were just too slow.
So I ended up writing this from scratch: stg/SYLT-FFT · GitHub
While optimizing and comparing performance with CMSIS DSP I was a bit surprised as performance crept closer and closer.
Eventually I reached break-even, and in it's current state it seems there is an improvement over CMSIS DSP performance.
I have only benchmarked fft_inverse and only for N=256 as this was really all I ever needed for my own project.
Using the CMSIS DSP library that comes precompiled with µVision for comparison is perhaps not a good comparison method?
Anyone interested in having a look at it, perhaps letting me know how I have failed in my comparison process?
Compiler and library versions, compilation flags, etc. are all documented in the README.md file.
General comments are more than welcome.