<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Kinetis K60 FPU Benchmark in Kinetis Microcontrollers</title>
    <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294896#M12033</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi, I'm using the &lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;TWR-K60F120M Tower module. I have the FPU enabled 'FPU with hard vfp passing' and ''c9x' model. I have code that performs 100,000 floating point adds in a loop. The results are as follows:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;Kinetis FPU enabled&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ~120ms&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;Kinetis No FPU (software)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ~500ms&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;Coldfire MCF52259CAG80&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ~150ms&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;Does these numbers seems resonable. I guess I was expecting the Kinetis to be a magnitude better than the Coldfire (which does not even have an FPU). Can the Kinetis w/FPU be only marginally better that the Coldfire libraries, or do I have something not configured correctly?&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 12 Mar 2014 19:59:06 GMT</pubDate>
    <dc:creator>mon3al</dc:creator>
    <dc:date>2014-03-12T19:59:06Z</dc:date>
    <item>
      <title>Kinetis K60 FPU Benchmark</title>
      <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294896#M12033</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi, I'm using the &lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;TWR-K60F120M Tower module. I have the FPU enabled 'FPU with hard vfp passing' and ''c9x' model. I have code that performs 100,000 floating point adds in a loop. The results are as follows:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;Kinetis FPU enabled&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ~120ms&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;Kinetis No FPU (software)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ~500ms&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;Coldfire MCF52259CAG80&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ~150ms&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: arial,helvetica,sans-serif;"&gt;Does these numbers seems resonable. I guess I was expecting the Kinetis to be a magnitude better than the Coldfire (which does not even have an FPU). Can the Kinetis w/FPU be only marginally better that the Coldfire libraries, or do I have something not configured correctly?&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 12 Mar 2014 19:59:06 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294896#M12033</guid>
      <dc:creator>mon3al</dc:creator>
      <dc:date>2014-03-12T19:59:06Z</dc:date>
    </item>
    <item>
      <title>Re: Kinetis K60 FPU Benchmark</title>
      <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294897#M12034</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Like MOST ARM instructions, the ARM website claims that most floating-point operations require 1 clock (subject to a number of caveats, of course, relative to access modes, pipeline, etc.) except: divide/sqrt at 14, and some 'dual operation' at 3, etc.&amp;nbsp; I assume you are running at 120Mhz, so your 1.2us/loop indicates 144 clocks per loop.&amp;nbsp; I would be curious what the assembly-code of this loop looks like!&amp;nbsp; Certainly there should be a 'VADD' in there, that takes 1 clock.&amp;nbsp; Note of course Cortex M4F is only a 'single precision' instruction set.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 13 Mar 2014 15:36:44 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294897#M12034</guid>
      <dc:creator>egoodii</dc:creator>
      <dc:date>2014-03-13T15:36:44Z</dc:date>
    </item>
    <item>
      <title>Re: Kinetis K60 FPU Benchmark</title>
      <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294898#M12035</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Here is the assembly listing....&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;for (i=0; i&amp;lt;100000; i++)&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; a:&amp;nbsp;&amp;nbsp;&amp;nbsp; f04f 0300&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; mov.w&amp;nbsp;&amp;nbsp;&amp;nbsp; r3, #0&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&amp;nbsp; e:&amp;nbsp;&amp;nbsp;&amp;nbsp; 603b &lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt; str&amp;nbsp;&amp;nbsp;&amp;nbsp; r3, [r7, #0]&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp; 10:&amp;nbsp;&amp;nbsp;&amp;nbsp; e00b &lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt; b.n&amp;nbsp;&amp;nbsp;&amp;nbsp; 2a &amp;lt;main+0x2a&amp;gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt; f1 = f1 + 1.76f;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp; 12:&amp;nbsp;&amp;nbsp;&amp;nbsp; ed97 7a01&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vldr&amp;nbsp;&amp;nbsp;&amp;nbsp; s14, [r7, #4]&lt;/P&gt;&lt;P&gt;&amp;nbsp; 16:&amp;nbsp;&amp;nbsp;&amp;nbsp; eddf 7a09&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vldr&amp;nbsp;&amp;nbsp;&amp;nbsp; s15, [pc, #36]&amp;nbsp;&amp;nbsp;&amp;nbsp; ; 3c &amp;lt;main+0x3c&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; 1a:&amp;nbsp;&amp;nbsp;&amp;nbsp; ee77 7a27&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vadd.f32&amp;nbsp;&amp;nbsp;&amp;nbsp; s15, s14, s15&lt;/P&gt;&lt;P&gt;&amp;nbsp; 1e:&amp;nbsp;&amp;nbsp;&amp;nbsp; edc7 7a01&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vstr&amp;nbsp;&amp;nbsp;&amp;nbsp; s15, [r7, #4]&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 14 Mar 2014 17:12:30 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294898#M12035</guid>
      <dc:creator>mon3al</dc:creator>
      <dc:date>2014-03-14T17:12:30Z</dc:date>
    </item>
    <item>
      <title>Re: Kinetis K60 FPU Benchmark</title>
      <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294899#M12036</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I don't see the 'end of loop' but I assume it is 'right there'.&amp;nbsp; I have to agree that I see the 'proper' list of single-precision floating-point instructions there, which counting clocks per ARM should come out to about 10 per loop here in those instructions -- so 100K cycles should take only a million clocks, or about a 1/120th of a second here (8 [maybe 11 with overhead] milliseconds)!&amp;nbsp; I am baffled -- we should certainly see the 'order of magnitude' performance increase you were expecting!&amp;nbsp; Unfortunately, I don't have any 'F' Kinetis CPUs myself to play with...&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 14 Mar 2014 19:42:30 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294899#M12036</guid>
      <dc:creator>egoodii</dc:creator>
      <dc:date>2014-03-14T19:42:30Z</dc:date>
    </item>
    <item>
      <title>Re: Kinetis K60 FPU Benchmark</title>
      <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294900#M12037</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Found some hardware, did my own little 'test' summing an array of 10,000 single-precision floats using RAM code and IAR tools.&amp;nbsp; First, an 'optimized' software-loop took 1.7ms.&amp;nbsp; Then after manually enabling the FPU (what's up with THAT?) the same add of all elements in a 10,000-word array dropped to 0.5ms.&amp;nbsp; Not the 'factor of 10' you would dream of, but in such a loop the floating-point operations are now a 'smaller percentage' of the overall instruction count--even with the IAR optimization for some un-rolling that makes the loop look like this, where only 1/4 of the instructions are 'VADD' (8 per outer loop):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for( uint32_t i=sizeof(Farray)/sizeof(float);i&amp;gt;0;i--)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1958: 0x4638&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MOV&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R0, R7&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff195a: 0xf240 0x41e2&amp;nbsp; MOVW&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R1, #1250&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ; 0x4e2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; accum += Farray[i];&lt;/P&gt;&lt;P&gt;??main_2:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff195e: 0x19aa&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ADDS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R2, R5, R6&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1960: 0xed92 0x0a00&amp;nbsp; VLDR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S0, [R2]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1964: 0xedd0 0x0a00&amp;nbsp; VLDR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S1, [R0]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1968: 0xee30 0x0a20&amp;nbsp; VADD.F32&amp;nbsp; S0, S0, S1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff196c: 0x1f02&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUBS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R2, R0, #4&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff196e: 0xedd2 0x0a00&amp;nbsp; VLDR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S1, [R2]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1972: 0xf1a0 0x0208&amp;nbsp; SUB.W&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R2, R0, #8&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1976: 0xee30 0x0a20&amp;nbsp; VADD.F32&amp;nbsp; S0, S0, S1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff197a: 0xedd2 0x0a00&amp;nbsp; VLDR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S1, [R2]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff197e: 0xf1a0 0x020c&amp;nbsp; SUB.W&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R2, R0, #12&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ; 0xc&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1982: 0xee30 0x0a20&amp;nbsp; VADD.F32&amp;nbsp; S0, S0, S1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1986: 0xedd2 0x0a00&amp;nbsp; VLDR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S1, [R2]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff198a: 0xf1a0 0x0210&amp;nbsp; SUB.W&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R2, R0, #16&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ; 0x10&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff198e: 0xee30 0x0a20&amp;nbsp; VADD.F32&amp;nbsp; S0, S0, S1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1992: 0xedd2 0x0a00&amp;nbsp; VLDR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S1, [R2]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff1996: 0xf1a0 0x0214&amp;nbsp; SUB.W&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R2, R0, #20&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ; 0x14&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff199a: 0xee30 0x0a20&amp;nbsp; VADD.F32&amp;nbsp; S0, S0, S1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff199e: 0xedd2 0x0a00&amp;nbsp; VLDR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S1, [R2]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19a2: 0xf1a0 0x0218&amp;nbsp; SUB.W&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R2, R0, #24&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ; 0x18&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19a6: 0xee30 0x0a20&amp;nbsp; VADD.F32&amp;nbsp; S0, S0, S1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19aa: 0xedd2 0x0a00&amp;nbsp; VLDR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S1, [R2]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19ae: 0xf1a0 0x021c&amp;nbsp; SUB.W&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R2, R0, #28&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ; 0x1c&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for( uint32_t i=sizeof(Farray)/sizeof(float);i&amp;gt;0;i--)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19b2: 0x3820&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUBS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R0, R0, #32&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ; 0x20&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19b4: 0xee30 0x0a20&amp;nbsp; VADD.F32&amp;nbsp; S0, S0, S1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19b8: 0xedd2 0x0a00&amp;nbsp; VLDR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S1, [R2]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19bc: 0x19aa&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ADDS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R2, R5, R6&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19be: 0x1e49&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUBS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; R1, R1, #1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19c0: 0xee30 0x0a20&amp;nbsp; VADD.F32&amp;nbsp; S0, S0, S1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19c4: 0xed82 0x0a00&amp;nbsp; VSTR&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; S0, [R2, #0]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for( uint32_t i=sizeof(Farray)/sizeof(float);i&amp;gt;0;i--)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; 0x1fff19c8: 0xd1c9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BNE.N&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ??main_2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ; 0x1fff195e&lt;/P&gt;&lt;P&gt;The total instruction count in the loop is 30, and 1250 iterations is 37,500 total instructions.&amp;nbsp; At 120MHz, that would 'ideally' have taken 0.3ms assuming 1 clock each, so I suppose we can 'write off' a 40% 'clock overhead' in RAM access and pipeline stalls.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So the 'bottom line' is that a factor-of-four may indeed be about what improvement you can expect in an overall compute-intensive sequence -- and what THAT means is that the 'software library' is actually pretty darn good(!) -- like 10 to 15 clocks for the single-precision add.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Some other benchmarks with the same loop:&lt;/P&gt;&lt;P&gt;32-bit integers 0.9ms(???) -- Must be RAM code-fetch getting in the way???&amp;nbsp; From ROM = 0.3ms. SP float from ROM = 2.2ms.&lt;/P&gt;&lt;P&gt;And not surprisingly double-precision float (double) takes 5.5ms with or without FPU, so apparently a SP FPU is of 'no help' in double-float math.&amp;nbsp; Double from ROM takes3.2ms&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 20 Mar 2014 14:26:30 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294900#M12037</guid>
      <dc:creator>egoodii</dc:creator>
      <dc:date>2014-03-20T14:26:30Z</dc:date>
    </item>
    <item>
      <title>Re: Re: Kinetis K60 FPU Benchmark</title>
      <link>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294901#M12038</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="color: #3d3d3d; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;&amp;gt; after manually enabling the FPU (what's up with THAT?)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Try adding this define to your compiler preprocessor defines:&lt;/P&gt;&lt;P&gt;__VFPV4__&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I was having problems compiling &amp;lt;cmath&amp;gt; and other issues, and doing that seemed to have solved it... and it looks like __fp_init() is defined in with that symbol in __arm_eabi_init.c&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 28 Mar 2014 21:12:45 GMT</pubDate>
      <guid>https://community.nxp.com/t5/Kinetis-Microcontrollers/Kinetis-K60-FPU-Benchmark/m-p/294901#M12038</guid>
      <dc:creator>bowerymarc</dc:creator>
      <dc:date>2014-03-28T21:12:45Z</dc:date>
    </item>
  </channel>
</rss>

