<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>i.MX ProcessorsのトピックNEON has same performance as C</title>
    <link>https://community.nxp.com/t5/i-MX-Processors/NEON-has-same-performance-as-C/m-p/1030793#M152247</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi i'm developing and image processing application on the Nxp imx7 and I want to compare performance of NEON instrutions vs pure c.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;c: a,b,c are float32. Take 11ms to run&lt;/P&gt;&lt;P&gt;for(int pixIndex = 0;pixIndex&amp;lt;(640*480);pixIndex++)&lt;BR /&gt; {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;a[pixIndex] = (a[pixIndex] * b[pixIndex]) + c[pixIndex];&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;NEON: Take 10ms to run&lt;/P&gt;&lt;P&gt;for(int pixIndex = 0;pixIndex&amp;lt;(640*480)/2;pixIndex++)&lt;BR /&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;float32x2_t dVect1, dVect2,&lt;SPAN&gt;dVect3&lt;/SPAN&gt;;&lt;/P&gt;&lt;P&gt;dVect1 = vld1_f32(a);&lt;BR /&gt; dVect2 = vld1_f32(b);&lt;BR /&gt; dVect3 = vld1_f32(c);&lt;BR /&gt; dVect1 = vmla_f32(dVect3,dVect1,dVect2);&lt;BR /&gt;vst1_f32(a,dVect1);&lt;BR /&gt;a+=2;&lt;/P&gt;&lt;P&gt;b+=2;&lt;/P&gt;&lt;P&gt;c+=2;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Why NEON is only 1ms faster than c ? Do I miss something here ?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 11 Dec 2019 00:41:42 GMT</pubDate>
    <dc:creator>alexandre_caron</dc:creator>
    <dc:date>2019-12-11T00:41:42Z</dc:date>
    <item>
      <title>NEON has same performance as C</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/NEON-has-same-performance-as-C/m-p/1030793#M152247</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi i'm developing and image processing application on the Nxp imx7 and I want to compare performance of NEON instrutions vs pure c.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;c: a,b,c are float32. Take 11ms to run&lt;/P&gt;&lt;P&gt;for(int pixIndex = 0;pixIndex&amp;lt;(640*480);pixIndex++)&lt;BR /&gt; {&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;a[pixIndex] = (a[pixIndex] * b[pixIndex]) + c[pixIndex];&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;NEON: Take 10ms to run&lt;/P&gt;&lt;P&gt;for(int pixIndex = 0;pixIndex&amp;lt;(640*480)/2;pixIndex++)&lt;BR /&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;float32x2_t dVect1, dVect2,&lt;SPAN&gt;dVect3&lt;/SPAN&gt;;&lt;/P&gt;&lt;P&gt;dVect1 = vld1_f32(a);&lt;BR /&gt; dVect2 = vld1_f32(b);&lt;BR /&gt; dVect3 = vld1_f32(c);&lt;BR /&gt; dVect1 = vmla_f32(dVect3,dVect1,dVect2);&lt;BR /&gt;vst1_f32(a,dVect1);&lt;BR /&gt;a+=2;&lt;/P&gt;&lt;P&gt;b+=2;&lt;/P&gt;&lt;P&gt;c+=2;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Why NEON is only 1ms faster than c ? Do I miss something here ?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 11 Dec 2019 00:41:42 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/NEON-has-same-performance-as-C/m-p/1030793#M152247</guid>
      <dc:creator>alexandre_caron</dc:creator>
      <dc:date>2019-12-11T00:41:42Z</dc:date>
    </item>
    <item>
      <title>Re: NEON has same performance as C</title>
      <link>https://community.nxp.com/t5/i-MX-Processors/NEON-has-same-performance-as-C/m-p/1030794#M152248</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Possibly hardware floating point is turned on by default in the toolchain.&lt;BR /&gt;And this small difference may be caused with methods of passing parameters.&lt;BR /&gt;I mean in first case you pass the parameters for multiplying and for adding functions but in second case only once for MLA.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 11 Dec 2019 03:12:30 GMT</pubDate>
      <guid>https://community.nxp.com/t5/i-MX-Processors/NEON-has-same-performance-as-C/m-p/1030794#M152248</guid>
      <dc:creator>b36401</dc:creator>
      <dc:date>2019-12-11T03:12:30Z</dc:date>
    </item>
  </channel>
</rss>

