Hi All
My customer had tried to use the ARM CMSIS DSP library, when use RFFT function discovery execution time too long compare with another brand MCU Cortex-M4.
Another brand MCU Cortex-M4 80Mhz execution time around 207us.
i.MX RT1024 Cortex-M7 500Mhz execution time is 850us~900us.
How to improve execution time on i.MX RT1024?
The target execution time is 33us or less.
below is example code and oscilloscope result.
Thanks.
Solved! Go to Solution.
Hi
I don't have a comparison with the RFFT but do have one with floating point FFT where the same code runs on a 120MHz Cortex m4 (K64) and the i.MX RT 1024 at 500MHz.
The values are for different FFT lengths and the times are for three steps in the process:
- converting sample to floating point buffer
- perform the in-place complex FFT (arm_cfft_f32()
- calculate the magnitude of output vectors
FFT Length | FFT processing time on 120MHz K64 (m4) with SW floating point operations | FFT processing time on 120MHz K64 (m4) with HW FPU operations (us) | FFT processing time on 500MHz i.MX RT 1024 (m7) with HW FPU operations (us) with code in ITC |
16 | 2.6/104/53 | 3.2/10.8/28 | 0.3/1.4/1.2 |
32 | 4.6/277/106 | 5.5/22.4/58 | 0.47/3.0/2.2 |
64 | 8.6/641/211 | 10.7/40.4/112 | 0.85/5.5/4.3 |
128 | 17/1712/421 | 22/108/218 | 1.6/13.9/10.3 |
256 | 33/4057/841 | 43/236/433 | 3.1/33.4/16.8 |
512 | 65/10540/1683 | 88/427/865 | 5.3/64/34.6 |
1024 | 131/21170/3360 | 173/1073/1730 | 10.3/155/66 |
2048 | 260/43110/- | 345/2439/- | 26.4/335/34.5 |
4096 | 516/63330/- | 686/4340/- | - |
This should give an idea of the performance improvement expected, whereby the use of the FPU is important (notice that SW implementation on the 120MHz K64 is some 20x slower than when using its single-precision FPU). The 1024 (at 500Hz) is about 13x faster again when performing a 1024 point transformation.
Based on this, I would expect that you can get your operation (207us on 80MHz M4) down to about 16us on the 500MHz i.MX RT 1024.
Make sure it is using its FPU and running the code in ITC (and data in DTC) for optimal efficiency. See this video as guide: https://www.youtube.com/watch?v=fnfLQ-nbscI
Regards
Mark
P.S.: I also have a comparison for some FFT lengths when run on the 48MHz KL27 (Cortex-m0+), which has no FPU
FFT Length | FFT processing time on 48MHz KL27 (m0+) in us |
16 | 11.3/510/236 |
32 | 20.9/1355/471 |
64 | 78.6/3202/942 |
128 | 152/8341/1883 |
For our discounted i.MX and Kinetis stock availability see https://www.utasker.com/Shop/semi.html
Hi
I don't have a comparison with the RFFT but do have one with floating point FFT where the same code runs on a 120MHz Cortex m4 (K64) and the i.MX RT 1024 at 500MHz.
The values are for different FFT lengths and the times are for three steps in the process:
- converting sample to floating point buffer
- perform the in-place complex FFT (arm_cfft_f32()
- calculate the magnitude of output vectors
FFT Length | FFT processing time on 120MHz K64 (m4) with SW floating point operations | FFT processing time on 120MHz K64 (m4) with HW FPU operations (us) | FFT processing time on 500MHz i.MX RT 1024 (m7) with HW FPU operations (us) with code in ITC |
16 | 2.6/104/53 | 3.2/10.8/28 | 0.3/1.4/1.2 |
32 | 4.6/277/106 | 5.5/22.4/58 | 0.47/3.0/2.2 |
64 | 8.6/641/211 | 10.7/40.4/112 | 0.85/5.5/4.3 |
128 | 17/1712/421 | 22/108/218 | 1.6/13.9/10.3 |
256 | 33/4057/841 | 43/236/433 | 3.1/33.4/16.8 |
512 | 65/10540/1683 | 88/427/865 | 5.3/64/34.6 |
1024 | 131/21170/3360 | 173/1073/1730 | 10.3/155/66 |
2048 | 260/43110/- | 345/2439/- | 26.4/335/34.5 |
4096 | 516/63330/- | 686/4340/- | - |
This should give an idea of the performance improvement expected, whereby the use of the FPU is important (notice that SW implementation on the 120MHz K64 is some 20x slower than when using its single-precision FPU). The 1024 (at 500Hz) is about 13x faster again when performing a 1024 point transformation.
Based on this, I would expect that you can get your operation (207us on 80MHz M4) down to about 16us on the 500MHz i.MX RT 1024.
Make sure it is using its FPU and running the code in ITC (and data in DTC) for optimal efficiency. See this video as guide: https://www.youtube.com/watch?v=fnfLQ-nbscI
Regards
Mark
P.S.: I also have a comparison for some FFT lengths when run on the 48MHz KL27 (Cortex-m0+), which has no FPU
FFT Length | FFT processing time on 48MHz KL27 (m0+) in us |
16 | 11.3/510/236 |
32 | 20.9/1355/471 |
64 | 78.6/3202/942 |
128 | 152/8341/1883 |
For our discounted i.MX and Kinetis stock availability see https://www.utasker.com/Shop/semi.html
Hi Mark
Thanks for your reply.