Compiler not using VLMA.F32  FPU instruction.  Any suggestions to direct the compiler?

Question asked by la_dsp on Mar 26, 2014
Latest reply on Apr 8, 2014 by la_dsp

Using TWR-K70 with CodeWarrior 10.5, mwccarm, and mwasmarm, when I compile the following C code:


        output = input * k2;

        output = (z * k1) + output;

        z = output;       


I get the following disassembly:


;   74:         output = input * k2;


0x00000036  0x8A0AEE29             vmul.F32         s16,s18,s20


;   75:         output = (z * k1) + output;


0x0000003A  0x0AA9EE28             vmul.F32         s0,s17,s19

0x0000003E  0x8A00EE38             vadd.F32         s16,s16,s0


;   76:         z = output;        


0x00000042  0x8A48EEF0             vmov.F32         s17,s16


I would've expected the compiler to use a multiply accumulate for line 75 (like vlma.F32   s16,s17,s19) instead of doing the multiply and add separately.   Does anyone have any ideas on how to get the compiler to use the FPU more efficiently?