NEON has same performance as C

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

NEON has same performance as C

1,265件の閲覧回数
alexandre_caron
Contributor II

Hi i'm developing and image processing application on the Nxp imx7 and I want to compare performance of NEON instrutions vs pure c.

c: a,b,c are float32. Take 11ms to run

for(int pixIndex = 0;pixIndex<(640*480);pixIndex++)
{

      a[pixIndex] = (a[pixIndex] * b[pixIndex]) + c[pixIndex];

}

NEON: Take 10ms to run

for(int pixIndex = 0;pixIndex<(640*480)/2;pixIndex++)
{

      

float32x2_t dVect1, dVect2,dVect3;

dVect1 = vld1_f32(a);
dVect2 = vld1_f32(b);
dVect3 = vld1_f32(c);
dVect1 = vmla_f32(dVect3,dVect1,dVect2);
vst1_f32(a,dVect1);
a+=2;

b+=2;

c+=2;

 

}

Why NEON is only 1ms faster than c ? Do I miss something here ?

ラベル(3)
タグ(1)
0 件の賞賛
返信
1 返信

1,198件の閲覧回数
b36401
NXP Employee
NXP Employee

Possibly hardware floating point is turned on by default in the toolchain.
And this small difference may be caused with methods of passing parameters.
I mean in first case you pass the parameters for multiplying and for adding functions but in second case only once for MLA.

0 件の賞賛
返信