Is FPU working in twrK60F120m???

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Is FPU working in twrK60F120m???

Jump to solution
850 Views
dcantero
Contributor II

Hi everybody!!!!

I did some test to evaluate the performance of the floating procesing unit in the twrK60F120m. I´m using IAR 6.50 and I have configured the project to use VFPv4 in the General Options dialog as shown in the image.

General_options.bmp

I have use a "hello world" example in the MQX 4.1 version and I have added a "for loop" containing a flating point operations:

    while (1) {

     

           GPIO_Tgl(GPIOA,PIN_11);

           for (i = 0; i<100000; i++) {

                   d_aux = d_aux *0.9012555 + 0.11152547;

           }

    }

d_aux is defined as double and initialized  to zero. GPIO_Tgl is a macro for writing directly in the PTOR register of the GPIOA PIN11 to measure the time elapsed in the loop.

I have measured elapsed time using a oscilloscope and I obtained 125ms to compute all 100000 operations, so aproximátely 1.25us per loop (aprox 150 clock cicles). This time looks an order of magnitude bigger than expected (maybe due to "hidden" instructions not explicitly showed in the code above??)  but when I disabled the FPU unit in the "General Options" dialog and repeated the same test I have obtained EXACTLY THE SAME RESULT!!!!!  I have done many test using divisions and "sqrt" and I have never found differences when disabling the FPU. So my question is, Is the FPU actually working in the Tower system using MQX 4.1? If yes, It is posible to optimize the code to be closer to the theorethical performance of the FPU? If not, How can I put the FPU working?  I spent some time searching in the Freescale web, consulting the processor manuals, in the communities, but the information about FPU is poor and abstruse. Where can I get moreinformation about this issue?

Thanks in advance!!!

David

Labels (1)
1 Solution
623 Views
carlos_neri
NXP Employee
NXP Employee

David,

The FPU implements Single Precision floating point. At the moment you declared your variable as double, the compiler wasn't able to use the FPU and implemented your operations with its own routines.

You also should be careful with integer and float promotions in C. This is, any integer constant will be promoted to the natural integer (probably uint32_t) and any floating point constant will be promoted to double.

Here's and example of what I'm talking about (using GCC), declaring d_aux as float:

d_aux = d_aux * 0.9012555 + 0.11152547;

  movw r3, #0

  movt r3, #0

  ldr r3, [r3, #0]

  mov r0, r3

  bl 0 <__aeabi_f2d>

  mov r2, r0

  mov r3, r1

  mov r0, r2

  mov r1, r3

  add r3, pc, #72

  ldrd r2, r3, [r3]

bl 0 <__aeabi_dmul>

  mov r2, r0

  mov r3, r1

  mov r0, r2

  mov r1, r3

  add r3, pc, #64

  ldrd r2, r3, [r3]

bl 0 <__aeabi_dadd>

  mov r2, r0

  mov r3, r1

  mov r0, r2

  mov r1, r3

  bl 0 <__aeabi_d2f>

  mov r2, r0

  movw r3, #0

The statements in read show how the compiler called the double functions to perform the arithmetic and the final is a conversion from double to float. This is the double promotion.

Now, the same code but with casts:

d_aux = d_aux * (float)0.9012555 + (float)0.11152547;

movw r3, #0

movt r3, #0

vldr s14, [r3]

vldr s15, [pc, #48]

vmul.f32 s14, s14, s15

vldr s15, [pc, #44]

vadd.f32 s15, s14, s15

movw r3, #0

movt r3, #0

vstr s15, [r3]

As you can see, the compiler now used the FP instructions and registers.

Could you please try this on your environment, first, declare d_aux as float and cast your constants to confirm the compiler will use the FPU instructions?

Also, make sure to "tell" MQX that FPU is in used. There should be a macro on your PSP or BSP config file that enables the kernel FPU management for context switch.

View solution in original post

0 Kudos
2 Replies
624 Views
carlos_neri
NXP Employee
NXP Employee

David,

The FPU implements Single Precision floating point. At the moment you declared your variable as double, the compiler wasn't able to use the FPU and implemented your operations with its own routines.

You also should be careful with integer and float promotions in C. This is, any integer constant will be promoted to the natural integer (probably uint32_t) and any floating point constant will be promoted to double.

Here's and example of what I'm talking about (using GCC), declaring d_aux as float:

d_aux = d_aux * 0.9012555 + 0.11152547;

  movw r3, #0

  movt r3, #0

  ldr r3, [r3, #0]

  mov r0, r3

  bl 0 <__aeabi_f2d>

  mov r2, r0

  mov r3, r1

  mov r0, r2

  mov r1, r3

  add r3, pc, #72

  ldrd r2, r3, [r3]

bl 0 <__aeabi_dmul>

  mov r2, r0

  mov r3, r1

  mov r0, r2

  mov r1, r3

  add r3, pc, #64

  ldrd r2, r3, [r3]

bl 0 <__aeabi_dadd>

  mov r2, r0

  mov r3, r1

  mov r0, r2

  mov r1, r3

  bl 0 <__aeabi_d2f>

  mov r2, r0

  movw r3, #0

The statements in read show how the compiler called the double functions to perform the arithmetic and the final is a conversion from double to float. This is the double promotion.

Now, the same code but with casts:

d_aux = d_aux * (float)0.9012555 + (float)0.11152547;

movw r3, #0

movt r3, #0

vldr s14, [r3]

vldr s15, [pc, #48]

vmul.f32 s14, s14, s15

vldr s15, [pc, #44]

vadd.f32 s15, s14, s15

movw r3, #0

movt r3, #0

vstr s15, [r3]

As you can see, the compiler now used the FP instructions and registers.

Could you please try this on your environment, first, declare d_aux as float and cast your constants to confirm the compiler will use the FPU instructions?

Also, make sure to "tell" MQX that FPU is in used. There should be a macro on your PSP or BSP config file that enables the kernel FPU management for context switch.

0 Kudos
623 Views
dcantero
Contributor II

Thank you very much Carlos!!!!

I have tested the changes you proposed me and I have obtained aproximately a processing time of less than 10ms per loop!!!

If the FPU is disabled the loop time increments until aprox 25ms (so the improvement factor is around 2.5). If I compare those results with the ones obtained using doubles the improvement factor is aproximately 12 (from 125ms to less than 10ms).  This is great for me because now I have enought time to do the processing tasks    :smileyhappy:

Thanks again!!

David