<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic LPC4370 FFT Performance.  I have some numbers... in LPC Microcontrollers</title>
    <link>https://community.nxp.com/t5/LPC-Microcontrollers/LPC4370-FFT-Performance-I-have-some-numbers/m-p/577137#M19556</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;STRONG&gt;Content originally posted in LPCWare by emh203 on Wed Jul 09 07:27:13 MST 2014&lt;/STRONG&gt;&lt;BR /&gt;&lt;SPAN&gt;Just posting this as it may be helpful to others.&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;See the project log:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&lt;A class="jive-link-external-small" href="https://community.nxp.com/external-link.jspa?url=http%3A%2F%2Fhackaday.io%2Fproject%2F1620-The-Human-Connection-%253A-1st-Impression" rel="nofollow" target="_blank"&gt;http://hackaday.io/project/1620-The-Human-Connection-%3A-1st-Impression&lt;/A&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;---------------------------------------------------------------------------------------------------------------------------&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;A core part of the algorithms we will use is a complex input FFT (a+jb).&amp;nbsp;&amp;nbsp;&amp;nbsp; Before going to far I wanted to evaluate the FFT performance of the LPC4370 M4 core.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Now,&amp;nbsp; an FPGA would rule the roost with FFT processing horsepower&amp;nbsp; BUT I am trying to keep this as low cost as possible.&amp;nbsp;&amp;nbsp; The 4370 on the LPC-Link2 is a place to start.&amp;nbsp;&amp;nbsp; FPGAs are great once you have everything worked out but HDL can be unforgiving.... (and are high cost!)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;So,&amp;nbsp; here are is some assumptions:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;LPC4370 -&amp;nbsp; Code running on the M4 core.&amp;nbsp; Clock rate at 204Mhz.&amp;nbsp; Exectution from RAMLoc128 (0x10000000 - 0x10020000)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;ARM CMSIS DSP libraries V 4.0.1.&amp;nbsp;&amp;nbsp; In particular I am looking at the function arm_cfft_radix4_q15&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I am using fixed point processing.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Input data is a 4096 q15_t array in RAM.&amp;nbsp;&amp;nbsp; (Note all processing is done in place... source data must be in RAM)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Optimizations are not turned on.&amp;nbsp;&amp;nbsp;&amp;nbsp; I used the version of GCC included with LPCXpresso 7.2&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Now,&amp;nbsp; I am targeting a 200Khz system sample rate with 4096 block size.&amp;nbsp; (This matches the max radix4 block size allowed by CMSIS DSP).&amp;nbsp;&amp;nbsp; This means I have a window of 20.48mS to get all my processing done.&amp;nbsp;&amp;nbsp; In the background,&amp;nbsp; new ADC data will be DMA's into a buffer and data will be DMA's from an output buffer to a DAC&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;So.... drum roll.&amp;nbsp;&amp;nbsp; The algorithm arm_cfft_radix4_q15&amp;nbsp; takes 2.4mS.&amp;nbsp;&amp;nbsp;&amp;nbsp; So, I have roughly a fact of 10 margin.&amp;nbsp;&amp;nbsp; Now, this will quickly get eaten up.&amp;nbsp; I have to do a minimum 2 FFTs (forward and reverse transform),&amp;nbsp; the magically scaling algorithms.&amp;nbsp;&amp;nbsp; Either way, this gives me a good amount of overhead.&amp;nbsp;&amp;nbsp;&amp;nbsp; I always have 2 other cores ready to go :-)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I also profiled arm_cfft_radix2_q15.&amp;nbsp;&amp;nbsp; It is a bit slower at 2.9mSec.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Code is in the hc-1 Github repository.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Last notes:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The board support library sometimes crashes in Board_SystemInit() at bootup when running from RAM.&amp;nbsp; I think a delay is need when setting up clock dividers or the crystal.&amp;nbsp; If I single step through the code,&amp;nbsp; it works...&amp;nbsp;&amp;nbsp; Also,&amp;nbsp;&amp;nbsp; using the internal osc and PLLing up to 204MHz is fine.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;These numbers would certainly get awful if running from SPIFI Flash.&amp;nbsp; (LPC-4370 is ROM-Less.&amp;nbsp;&amp;nbsp; You have to bootload from SPIFI flash into RAM or execute from SPIFI...)&amp;nbsp;&amp;nbsp; Maybe I can do that some other day&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 15 Jun 2016 19:02:05 GMT</pubDate>
    <dc:creator>lpcware</dc:creator>
    <dc:date>2016-06-15T19:02:05Z</dc:date>
    <item>
      <title>LPC4370 FFT Performance.  I have some numbers...</title>
      <link>https://community.nxp.com/t5/LPC-Microcontrollers/LPC4370-FFT-Performance-I-have-some-numbers/m-p/577137#M19556</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;STRONG&gt;Content originally posted in LPCWare by emh203 on Wed Jul 09 07:27:13 MST 2014&lt;/STRONG&gt;&lt;BR /&gt;&lt;SPAN&gt;Just posting this as it may be helpful to others.&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;See the project log:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&lt;A class="jive-link-external-small" href="https://community.nxp.com/external-link.jspa?url=http%3A%2F%2Fhackaday.io%2Fproject%2F1620-The-Human-Connection-%253A-1st-Impression" rel="nofollow" target="_blank"&gt;http://hackaday.io/project/1620-The-Human-Connection-%3A-1st-Impression&lt;/A&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;---------------------------------------------------------------------------------------------------------------------------&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;A core part of the algorithms we will use is a complex input FFT (a+jb).&amp;nbsp;&amp;nbsp;&amp;nbsp; Before going to far I wanted to evaluate the FFT performance of the LPC4370 M4 core.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Now,&amp;nbsp; an FPGA would rule the roost with FFT processing horsepower&amp;nbsp; BUT I am trying to keep this as low cost as possible.&amp;nbsp;&amp;nbsp; The 4370 on the LPC-Link2 is a place to start.&amp;nbsp;&amp;nbsp; FPGAs are great once you have everything worked out but HDL can be unforgiving.... (and are high cost!)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;So,&amp;nbsp; here are is some assumptions:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;LPC4370 -&amp;nbsp; Code running on the M4 core.&amp;nbsp; Clock rate at 204Mhz.&amp;nbsp; Exectution from RAMLoc128 (0x10000000 - 0x10020000)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;ARM CMSIS DSP libraries V 4.0.1.&amp;nbsp;&amp;nbsp; In particular I am looking at the function arm_cfft_radix4_q15&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I am using fixed point processing.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Input data is a 4096 q15_t array in RAM.&amp;nbsp;&amp;nbsp; (Note all processing is done in place... source data must be in RAM)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Optimizations are not turned on.&amp;nbsp;&amp;nbsp;&amp;nbsp; I used the version of GCC included with LPCXpresso 7.2&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Now,&amp;nbsp; I am targeting a 200Khz system sample rate with 4096 block size.&amp;nbsp; (This matches the max radix4 block size allowed by CMSIS DSP).&amp;nbsp;&amp;nbsp; This means I have a window of 20.48mS to get all my processing done.&amp;nbsp;&amp;nbsp; In the background,&amp;nbsp; new ADC data will be DMA's into a buffer and data will be DMA's from an output buffer to a DAC&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;So.... drum roll.&amp;nbsp;&amp;nbsp; The algorithm arm_cfft_radix4_q15&amp;nbsp; takes 2.4mS.&amp;nbsp;&amp;nbsp;&amp;nbsp; So, I have roughly a fact of 10 margin.&amp;nbsp;&amp;nbsp; Now, this will quickly get eaten up.&amp;nbsp; I have to do a minimum 2 FFTs (forward and reverse transform),&amp;nbsp; the magically scaling algorithms.&amp;nbsp;&amp;nbsp; Either way, this gives me a good amount of overhead.&amp;nbsp;&amp;nbsp;&amp;nbsp; I always have 2 other cores ready to go :-)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I also profiled arm_cfft_radix2_q15.&amp;nbsp;&amp;nbsp; It is a bit slower at 2.9mSec.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Code is in the hc-1 Github repository.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Last notes:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The board support library sometimes crashes in Board_SystemInit() at bootup when running from RAM.&amp;nbsp; I think a delay is need when setting up clock dividers or the crystal.&amp;nbsp; If I single step through the code,&amp;nbsp; it works...&amp;nbsp;&amp;nbsp; Also,&amp;nbsp;&amp;nbsp; using the internal osc and PLLing up to 204MHz is fine.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;These numbers would certainly get awful if running from SPIFI Flash.&amp;nbsp; (LPC-4370 is ROM-Less.&amp;nbsp;&amp;nbsp; You have to bootload from SPIFI flash into RAM or execute from SPIFI...)&amp;nbsp;&amp;nbsp; Maybe I can do that some other day&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 15 Jun 2016 19:02:05 GMT</pubDate>
      <guid>https://community.nxp.com/t5/LPC-Microcontrollers/LPC4370-FFT-Performance-I-have-some-numbers/m-p/577137#M19556</guid>
      <dc:creator>lpcware</dc:creator>
      <dc:date>2016-06-15T19:02:05Z</dc:date>
    </item>
  </channel>
</rss>

