NEON has same performance as C

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

NEON has same performance as C

1,274 次查看
alexandre_caron
Contributor II

Hi i'm developing and image processing application on the Nxp imx7 and I want to compare performance of NEON instrutions vs pure c.

c: a,b,c are float32. Take 11ms to run

for(int pixIndex = 0;pixIndex<(640*480);pixIndex++)
{

      a[pixIndex] = (a[pixIndex] * b[pixIndex]) + c[pixIndex];

}

NEON: Take 10ms to run

for(int pixIndex = 0;pixIndex<(640*480)/2;pixIndex++)
{

      

float32x2_t dVect1, dVect2,dVect3;

dVect1 = vld1_f32(a);
dVect2 = vld1_f32(b);
dVect3 = vld1_f32(c);
dVect1 = vmla_f32(dVect3,dVect1,dVect2);
vst1_f32(a,dVect1);
a+=2;

b+=2;

c+=2;

 

}

Why NEON is only 1ms faster than c ? Do I miss something here ?

标签 (3)
标记 (1)
0 项奖励
回复
1 回复

1,207 次查看
b36401
NXP Employee
NXP Employee

Possibly hardware floating point is turned on by default in the toolchain.
And this small difference may be caused with methods of passing parameters.
I mean in first case you pass the parameters for multiplying and for adding functions but in second case only once for MLA.

0 项奖励
回复