NEON has same performance as C

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

NEON has same performance as C

599 Views
alexandre_caron
Contributor II

Hi i'm developing and image processing application on the Nxp imx7 and I want to compare performance of NEON instrutions vs pure c.

c: a,b,c are float32. Take 11ms to run

for(int pixIndex = 0;pixIndex<(640*480);pixIndex++)
{

      a[pixIndex] = (a[pixIndex] * b[pixIndex]) + c[pixIndex];

}

NEON: Take 10ms to run

for(int pixIndex = 0;pixIndex<(640*480)/2;pixIndex++)
{

      

float32x2_t dVect1, dVect2,dVect3;

dVect1 = vld1_f32(a);
dVect2 = vld1_f32(b);
dVect3 = vld1_f32(c);
dVect1 = vmla_f32(dVect3,dVect1,dVect2);
vst1_f32(a,dVect1);
a+=2;

b+=2;

c+=2;

 

}

Why NEON is only 1ms faster than c ? Do I miss something here ?

Labels (3)
Tags (1)
0 Kudos
1 Reply

532 Views
b36401
NXP Employee
NXP Employee

Possibly hardware floating point is turned on by default in the toolchain.
And this small difference may be caused with methods of passing parameters.
I mean in first case you pass the parameters for multiplying and for adding functions but in second case only once for MLA.

0 Kudos