I am running a simple filter algorithm on the K60 but this is taking too long and using up almost all of the processor's power (when using CodeWarrior).
As long as the length of the filter's taps is 32, 64 etc. it is fast enough (because a divide by 64 is performed by a shift in the code) but as soon as this is not the case the divide increases its time dramatically.
The divide is performed by __FSL_s32_div_f() but I was expecting that the Cortex SDIV would be used since it is a signed 32 bit divide which I believe takes about 12 clocks to complete. The subroutine presently being called is presumably doing the work in software.
The build settings are for Cortex M4 (with FP in SW) but I don't see how maybe the simple divide can be controlled.
Question - is this a (CW) settings issue or does one have to write routines in assembler to make use of instruction set capabilities?
I encountered a quite similar problem with a loop controller algorithm using a K70. Although my project was build using a GCC compiler the solution may be the same.
Your first try should be the compiler options, try using "-Ofast". Only using this option got the compiler to use the SDIV and MUL instructions instead of the software implementation.
A second option would be to turn on the prefetching and caching to speed up the whole processor. They are not enabled by default.
Hope this helps.
The MUL instruction was being used (thankfully) but not the DIV. I build with -O4, but for space and not speed - the routines don't use loops and so are inherently 'rolled-out'.
In this instance I found that I could modify the algorithm to use a 64 divide (and so a shift which is faster than the DIV) by adding a single MUL at the the end some where so I didn't check (yet) to see whether the optimisation for speed causes the DIV to be used.
At the moment your remark about prefetching and caching has caught my eye since according to the K60 manual this is already enabled by default (FMC_PFB0CR and FMC_PFB1CR default to speculation and cache enabled). Since I have used the K60N512 for some time I originally 'actively' disabled these due to erratas E2647 and E2644 but with newer devices leave then as default.
Are you sure that they need to be "actively" enabled beyond the default settings?
when using the K70 turning on the prefetching by hand we got a significant speed-up of a interrupt service routine (one we need to be really fast).
The actual user manual of K70 also states that prefetching is enabled by default so I have to check the actual behavior with one of our devices.
This may take one or two days.
Using errata workarounds (for older K60 device) which "actively" disabled cache and speculation I get 10.8us for my ADC interrupt processing (filters).
When I remove the errata workarounds (not actively disabling the cache and speculation but leaving the register settings as default - I presume ON with standard settings) I get 7.9us..9.2us (this is jittering presumably due to the cache state at each interrupt), showing that there is an improvement of speed [since I have a newer chip on the board being worked with this is possible as standard]. I do know that there are various cache settings that are possible, which can presumably be tuned a bit to match exact user/application requirements, but I don't think that I will experiment with theae.
Basically I am happy that avoiding the division keeps it well away from the original 51us that I had...
P.S. CW now used GCC so I expect that the behaviour that you mentioned in your project is essentially the same in CW.
I am sorry for the delay. It took way more than 1 or 2 days.
I checked a quite old version of our software which is using CW (Freescale compiler not GCC) for compiling and debugging. In this version the caching and prefetching is turned off although there are no code instructions doing so. Maybe the debugger or the CW disables them before writing the flash.
Using GCC and J-Link Segger GDB Server it was always on.