Was in need of Q31 in-place IFFT for a Kinetis K20 project with very limited flash and RAM.
While fast implementations I tried were too resource-hungry, low-footprint implementations were just too slow.
So I ended up writing this from scratch: stg/SYLT-FFT · GitHub
While optimizing and comparing performance with CMSIS DSP I was a bit surprised as performance crept closer and closer.
Eventually I reached break-even, and in it's current state it seems there is an improvement over CMSIS DSP performance.
I have only benchmarked fft_inverse and only for N=256 as this was really all I ever needed for my own project.
Using the CMSIS DSP library that comes precompiled with µVision for comparison is perhaps not a good comparison method?
Anyone interested in having a look at it, perhaps letting me know how I have failed in my comparison process?
Compiler and library versions, compilation flags, etc. are all documented in the README.md file.
General comments are more than welcome.
I think your post would be more useful if you could change the subject line to something more related to the issue at hand (it is called by a reason, after of all, right? :smileywink:) and add some pertinent tags to help people researching matters on this realm.
As it is posted, my contribution to this thread is a personal experience with CMSIS DSP library where the apparently same code from version 3.01 to 4.2 compiled in CW 10.6 has a far better perfomance (speed of execution wise) in the later version.
my 0.019999... and HTH,
Thanks, I've never been good at this whole forum thing :smileyplain:
I've worked with and separately optimized the code for GCC (GNU Tools for ARM Embedded) and Keil ARMCC - but not CodeWarrior.
The library referenced for comparison is the arch specific library that came pre-compiled with µVision, and it may not be optimally compiled for sure.
Still - here's hoping this might be of use to/help someone, somewhere at some point.
I'm also in the learning stage :smileyhappy: still!
CW is only an IDE, I think one of the compilers can be used with it is GCC.
I'm curious about your benchmarks, but I'm swamped by now, will have to wait until the end of the year "holidays" to have some time to look at it.
I'm sure it will of use for someone.