Has anybody implemented CMSIS FFT library Successfully?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Has anybody implemented CMSIS FFT library Successfully?

Jump to solution
18,170 Views
Amit_Kumar1
Senior Contributor II

Hi

I want to use arm_cfft_radix4_init_q15() and arm_cfft_radix4_q15() function from CMSIS 3.2 library . I am unable to find any good resources or any examples. I have looked in the documents provided by ARM i.e CMSIS-DSP: Complex FFT Functions but I didn't find it useful. What are the inputs given to these functions I am still not clear with it. I tried writing the codes but it didn't worked

this is how I am calling the functions,

arm_cfft_radix4_init_q15(&arm_cfft_sR_q15_len256, 256, 0, 1);          // For init

arm_cfft_radix4_q15(&arm_cfft_sR_q15_len256, az);                      // For transformation where,


arm_cfft_radix4_instance_q15 arm_cfft_sR_q15_len256 = {256, 0, 1, twiddleCoef_256, armBitRevIndexTable256, 4U, 4U};

  and az[512] is an array where the data is stored. Just for testing 

float32_t az[512] = {1.2 ,0.0

                             1.3, 0.0

                              -------

                              --------

                              1.2, 0.0};

my programs hangs and I am not able to print the o/p. I am using Codewrrior 10.5 and want to implement this on FRDM-K20D50M. I posted the query in ARM's community also but I didn't get any response which makes me think if anybody is actually using this library. Please look into the matter.

Kind Regards

Amit Kumar

1 Solution
11,838 Views
martynhunt
NXP Employee
NXP Employee

Hi,

I have some good news. Your build of the library has the correct compiler options set, and your applications does as well. However, it appears you are linking the application to the pre-built library provided by CMSIS. This pre-built version does not have -ffunction-sections -fdata-sections set; therefore, the unused data will not be removed.

To link your rebuild of the library correctly into your project you need to make sure you have the following properties set:

ToolSettings_Linker_Libraries.PNG.png

and:

ToolSettings_Linker_Misc.PNG.png

Please let me know if this works for you.

Best regards,

Martyn

P.S. For reference the correct compiler settings for library and application are:

Library Compiler:

Library_Optimizations.PNG.png

Application compiler and linker options:

Application_Compiler_Optimizations.PNG.png

Applications_Linker_Options.PNG.png

View solution in original post

26 Replies
10,235 Views
mjbcswitzerland
Specialist V

Hi All

I am hoping to find some people who have used this with floating points and could maybe explain why I get these results?

Test code:

float fft_input_buffer[512];
float fft_magnitude_buffer[256];

fnInjectFloat(fft_input_buffer, 512);

arm_cfft_f32(&arm_cfft_sR_f32_len256, fft_input_buffer, 0, 1); // perform complex fast-fourier transform (the result is in the input buffer)
arm_cmplx_mag_f32(fft_input_buffer, fft_magnitude_buffer, 256); // calculate the magnitude of each frequency component (256 bins are now available) - 48kHz sampling rate gives 0..24kHz spectrum with 93Hz resolution per bin

The function fnInjectFloat() is putting a high resolution test signal, in this specific reference case a 7'000Hz sine wave, into the input buffer.

Before calling the FFT the input buffer can be displayed to have the expected input signal (75 cycles in 512 input buffer):

pastedImage_5.png

After performing the FFT and calculating the magnitude of the output bins one expects to find the energy at 7kHz. This is however the result in the magnitude buffer:

pastedImage_6.png

There is a nice peak at 7kHz but there is a second peak at 17kHz (24kHz - 7kHz) with three times the energy.

I have tested with various FFT lengths with the same result.
I have tested on various chips (m0+, m4 - with and without FPU) and in each case with the same result.

If the test frequency is increase, eg. to 8kHz the second component is at 16kHz (24kHz - 8 kHz). If a frequency greater than 12kHz is injected the unexpected bin energy is less that 12kHz.

Finally, if real sampled input is used - rather than preparing a signal in the input buffer - the same results are also obtained in each case.

I can't believe that the CMSIS FFT doesn't work (I tried version 3.2 and also the very latest V5.1) but the results are presently not usable.... The only code used is the CMSIS library code so I presently see no influences that can cause errors to be introduced.

Can anyone verify or explain what is happening?

Regards

Mark

0 Kudos
Reply
10,235 Views
mjbcswitzerland
Specialist V

Hi

I would like to add that when the ARM test input "arm_fft_bin_data.c" is used instead of other inputs or real signals the output does pass the ARM test case (it checks for maximum energy in bin 213 as criteria) but it seems that passing the test case doesn't prove that it actually works.... (???????)

Regards

Mark

0 Kudos
Reply
10,235 Views
mjbcswitzerland
Specialist V

Hi All

Here are a further simple comparison result.

Another example where we have taken a simple 16 point FFT with a windowed signal (32 samples of a simple test single).

pastedImage_1.png

The same 32 input samples into the same FFT in Matlab gives:

pastedImage_2.png

 

There are certain similarities and their is a feeling that something is inverted but no attempts to work out what needs to be moved around has been successful yet!

The present situation is than no one here (several engineers with DSP/signal processing experience) have any idea what is going on to give such results....

Regards

Mark

0 Kudos
Reply
10,235 Views
egoodii
Senior Contributor III

Without actually delving into any details myself(!), I will agree that this test makes it look like some array addressing is swapped 'end for end', shifting some results to the wrong end --- what was the complete resolution to the 11/11/14 'address reversal'  question/answer?

0 Kudos
Reply
10,235 Views
mjbcswitzerland
Specialist V

Earl

I have realised that the CMSIS CFFT requires that the inputs samples are complex. If I insert a 0 between each sample (representing a zero complex valid) I then get correct results. The ARM reference also has 0.0 at every second array location in its input, which is why it works.

There is also a RFFT version (real) which I was hoping would save having to put in the zeros but initial tests didn't confirm this. For the moment I am continuing with the CFFT and might retry the RFFT later again (it may be faster since it performs a CFFT of half the size and then extracts real value from it [somehow]).

Since I wasn't getting an responses here or at mbed I tried "DSPrelated" and quickly got some ideas that led to the resolution.

Regards

Mark

0 Kudos
Reply
10,235 Views
carstengroen
Senior Contributor II

I'm sorry to re-activate a 3 year old thread, but I wonder if any of you got the rfft to work ? I have a project where I get data from a sensor, initially I used the rfft function from CMSIS-DSP (vers 1.8.0), but I also see the spurious at the top of the bins. I switched over to cfft (and interleaved the real data with zeros) and everything works as expected.

As I'm a little tight on execution speed, I would love to get the rfft to work properly.

Any ideas, or is there something fishy with the rfft ?

0 Kudos
Reply
10,235 Views
mjbcswitzerland
Specialist V

Carsten

Personally I didn't revisit the RFFT and also moved to a different library due to the requirement to be able to do arbitrary length FFTs  (the CMSIS ones can only do power of two lengths and are also limit in maximum length) which means that I can't give any feedback about the RFFT operation or accuracy.

Sorry,

Mark
[uTasker project developer for Kinetis and i.MX RT]

10,235 Views
carstengroen
Senior Contributor II

Thanks mjbcswitzerland

I spent a couple days messing around with the rfft before I found your post above which saved my day Smiley Happy

Anyway, I will continue for now with the cfft, and maybe later, look once more at the rfft (I have seen a couple of demos that uses this function) when I have the time.

I still need to understand the scaling of the output from the cfft as there seems not to be a 1:1 relationship between input and output (any idea maybe ?)

My code (and comment) so far:

//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
// Using cfft (and mag) functions, a 1.0 in input level yields 256.0 in the mag output level.
// Looking at the output of tempBuffer after the cfft (feeding a sine at 1.0 peakvalue, 
// the real part of the peak bin is -0.0012 and the img. part is -256)
// Process the data through the CFFT/CIFFT module
arm_cfft_f32(&varInstCfftF32, tempBuffer, 0, 1); 
// Convert the complex output from the cfft function to real numbers, we only do half the points as the second half is negative/mirrored data)
arm_cmplx_mag_f32(tempBuffer, freqBuffer, FFT_SIZE / 2);
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

PS: Thanks for your many articles & videos on the RT series, very nice indeed !

0 Kudos
Reply
10,222 Views
mjbcswitzerland
Specialist V

Carsten

Although I have been using a different FFT library for product developments requiring more capabilities I have integrated the complex FFT in the uTasker project so that it can be used simply on (in-place) circular buffers (which is what tends to be needed in real-time applications).

There is a document at:
https://www.utasker.com/docs/uTasker/uTasker_DSP.pdf
And a video showing it in operation at:
https://www.youtube.com/watch?v=n-GABeILGV8&feature=youtu.be

This is the actual code for the calculation (after each part of the USB audio buffer has been filled by the received waveform) of the video reference on a K22F:

static void fnPerformFFT(int iBufferReference)
{
    int iInputSample;
    int iOutputSample = 0;
    int iInputOffset = 0;
    int iCopyLength = FFT_INPUT_SAMPLES;
    float fft_magnitude_buffer[(FFT_INPUT_SAMPLES/2)];                   // temporary fft frequency amplitude output
    signed short sRawInput[FFT_INPUT_SAMPLES];                           // temporary linear input buffer with original waveform
    switch (iBufferReference) {
    case EVENT_FFT_HALF_PONG:
        iInputOffset = (FFT_INPUT_SAMPLES/2);
        iInputSample = (FFT_INPUT_SAMPLES/4);
        break;
    case EVENT_FFT_PONG:
        iInputOffset = (FFT_INPUT_SAMPLES);
        iInputSample = (FFT_INPUT_SAMPLES/2);
        break;
    case EVENT_FFT_HALF_PING:
        iInputOffset = (2 *(3 * FFT_INPUT_SAMPLES/4));
        uMemcpy(&sRawInput[0], &fft_buffer[(3 * FFT_INPUT_SAMPLES / 4)], ((FFT_INPUT_SAMPLES / 2) * sizeof(signed short))); // handle the circular buffer wrap around in this case
        iOutputSample = (FFT_INPUT_SAMPLES / 2);
        iCopyLength = (FFT_INPUT_SAMPLES / 2);
        // Fall-through intentionally
        //
    case EVENT_FFT_PING:
    default:
        iInputSample = 0;
        break;
    }
    uMemcpy(&sRawInput[iOutputSample], &fft_buffer[iInputSample], (iCopyLength * sizeof(signed short)));
    fnFFT((void *)fft_buffer, (void *)fft_magnitude_buffer, FFT_INPUT_SAMPLES, iInputOffset, (FFT_INPUT_SAMPLES * sizeof(signed short)), windowing_buffer, window_conversionFactor, (FFT_INPUT_HALF_WORDS_SIGNED | FFT_OUTPUT_FLOATS | FFT_MAGNITUDE_RESULT)); // perform complex fast-fourier transform (the result is in the input buffer)
    #if defined BLAZE_K22
    fnDisplayFFT(fft_magnitude_buffer, sRawInput);                       // display FFT output on display
    #endif
}
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The magnitude compensation is included in the Windowing step, whereby a Blackmann Harris window is (typically) used.

window_conversionFactor = fnGenerateWindowFloat(windowing_buffer, FFT_INPUT_SAMPLES, BLACKMANN_HARRIS_WINDOW); // calculate a window for use by the FFT

This means that all of the dirty work is handled in fnFFT() so that it can be used simply.

The CMSIS coefficients code has been optimised in the uTasker project to make the sure that no unnecessary tables are in the code, otherwise these can unnecessarily use up code space. The range of samples is from 16, 32, 64,... 4k although recently the CMSIS developers showed (after being asked repeated during two years) how to extend to 8k, although they won't include it in the public version:
https://github.com/ARM-software/CMSIS_5/issues/393#issuecomment-546250633

I may add it at at some point.

As well as a few FFT references in the project (and the ability to simulate all operation and inject simulated sine waves) I also have included the CMSIS FFT test reference in order to check that the reference waveform detects the correct frequency and amplitude.

I have attached the DSP.c file and command line menu interface containing the FFT test code "fft" (debug.c) - search for fnTestFFT() so that you should be able to see all details.

uTasker project users select the CMSIS FFT support and configure which sample lengths are needed by the defines:

  //#define CMSIS_DSP_CFFT                                               // enable FFT support - details at http://www.utasker.com/docs/uTasker/uTasker_DSP.pdf
        #define CMSIS_DSP_CFFT_FLOAT
      //#define CMSIS_DSP_CFFT_Q15
      //#define CMSIS_DSP_FFT_16                                         // enable 16 point FFT
      //#define CMSIS_DSP_FFT_32                                         // enable 32 point FFT
      //#define CMSIS_DSP_FFT_64                                         // enable 64 point FFT
      //#define CMSIS_DSP_FFT_128                                        // enable 128 point FFT
      //#define CMSIS_DSP_FFT_256                                        // enable 256 point FFT
      //#define CMSIS_DSP_FFT_512                                        // enable 512 point FFT
      //#define CMSIS_DSP_FFT_1024                                       // enable 1024 point FFT
        #define CMSIS_DSP_FFT_2048                                       // enable 2048 point FFT
        #define CMSIS_DSP_FFT_4096                                       // enable 4096 point FFT
        #define CMSIS_DSP_FFT_8192                                       // enable 8192 point FFT
‍‍‍‍‍‍‍‍‍‍‍‍‍

Regards

Mark
[uTasker project developer for Kinetis and i.MX RT]

0 Kudos
Reply
10,217 Views
carstengroen
Senior Contributor II

mjbcswitzerland‌, 

thanks a lot!

I'm not sure I followed all your code to the last dot, but I didn't figure out what the relationship between input and output amplitudes are ?

When I do the cfft, if I input a sine with 1.0 in amplitude (peak value) I get 256.0 at correct bin at the output from the arm_cfft_f32() function. Now, this might be perfectly fine (and I can easily live with that), I would just like to understand why it is so. It seems that the rfft has the same scaling on the inputs and outputs, if you give the rfft 1.0 on the input, you will get 1.0 on the output. On the cfft, 1.0 on the input gives 256.0 on the output

//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
// Using cfft (and mag) functions, a 1.0 in input level yields 256.0 in the mag output level.
// Looking at the output of tempBuffer after the cfft (feeding a sine at 1.0 peakvalue, 
// the real part of the peak bin is -0.0012 and the img. part is -256)
// Process the data through the CFFT/CIFFT module
arm_cfft_f32(&varInstCfftF32, tempBuffer, 0, 1); 
// Convert the complex output from the cfft function to real numbers, we only do half the points as the second half is negative/mirrored data)
arm_cmplx_mag_f32(tempBuffer, freqBuffer, FFT_SIZE / 2);
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos
Reply
10,217 Views
mjbcswitzerland
Specialist V

Carsten

What is the FFT length being used in this case?

Regards

Mark

0 Kudos
Reply
10,217 Views
carstengroen
Senior Contributor II

Mark,

sorry, that didn't exactly show in my code :smileygrin:

Its 512

0 Kudos
Reply
10,217 Views
mjbcswitzerland
Specialist V

Carsten

You need to divide the output by (FFT_SIZE/2) so that 1.0 in gives 1.0 out (all energy in a single bin).

Regards

Mark
[uTasker project developer for Kinetis and i.MX RT]

0 Kudos
Reply
10,217 Views
carstengroen
Senior Contributor II

Mark,

yes I'm aware of that, I just wonder why it is so, the rfft function gives 1.0 out for 1.0 in (single tone).

Anyway, its no big deal, was just one of those "hmmm, why is it so" :smileyhappy:

Carsten

0 Kudos
Reply
10,217 Views
mjbcswitzerland
Specialist V

Carsten

I don't know either - there are usually implementation details that are different and if you can find CMSIS DSP library documents it may be explained there.

Regards

Mark
[uTasker project developer for Kinetis and i.MX RT]

0 Kudos
Reply
10,235 Views
egoodii
Senior Contributor III

Cool!  Glad to hear you're making forward progress again! They don't call it 'complex' for nothin'....

0 Kudos
Reply
10,235 Views
ibrahimerturk
Contributor I

Hi, I'm in trouble with the CMSIS DSP library bitreversal function and solved it the workaround described here. Hope this helps.

Regards

Ibrahim

10,235 Views
egoodii
Senior Contributor III

Are you saying that the Cortex-M 'bit reverse' single-instruction isn't working for this applications?

The IAR toolchain gives the programmer access to these 'special capabilities' via 'intrinsic functions' that let the compiler handle normal register allocation while inserting the particular instruction in the sequence like an 'inline function':

Syntax

     unsigned long      __RBIT(unsigned long);

Description

     Inserts an RBIT instruction, which reverses the bit order in a 32-bit register.

0 Kudos
Reply
10,235 Views
ibrahimerturk
Contributor I

It is not single-instruction. It is a function that reverses bit order of all elements of FFT output array after calling FFT over an array. The function is member of ARM CMSIS Math (DSP) Library.

0 Kudos
Reply
10,235 Views
martynhunt
NXP Employee
NXP Employee

Hi,

Here is the FFT bin example from an older version of CMSIS-DSP. Hopefully this helps you. It is for f32, but should work if you change the names to q15 versions.

Best regards,

Martyn