Using native assembler instruction VSQRT.F64 instead of double sqrt(double x).

magro732 · ‎12-16-2021

I am using the MIMXRT1170-EVK board and MCUXpresso IDE and trying to use the assembler instruction for SQRT with double precision on the Cortex-M7. I can get this working for single-precision using:

float arm_sqrt_f32(float x)
{
float returnValue;
__asm__("VSQRT.F32 %0, %1" : "=t" (returnValue) : "t" (x));
return returnValue;
}

Is the VSQRT.F64-instruction available for iMXRT1170? When I use the __asm__-keyword:

double arm_sqrt_f64(double x)
{
double returnValue;
__asm__("VSQRT.F64 %0, %1" : "=w" (returnValue) : "w" (x));
return returnValue;
}

I get the error message:
Error: invalid instruction shape -- `vsqrt.f64 s0,s0'

So it seems to be available but the asm-keyword is not generating the right registers. I found an old post about this in Bug #1856486 “constraint “w” produces access to single precissio...” : Bugs : GNU Arm Embedded Toolc..., where it is reported as a bug in GCC but that was several years ago.

How can I enable VSQRT.F64?

jingpan · ‎12-21-2021

Hi @magro732 ,

I modified your code, it can be compiled without any problem.

Regards,

Jing

View solution in original post

magro732 · ‎12-17-2021

I eventually got this working myself by defining my own C-callable assembler function.

But it would still be nice to be able to use GCC inline assembler instead of a seperate .asm-file so if anyone know how to do this, please post me.

jingpan · ‎12-20-2021

Hi @magro732 ，

You can force it as inline.

__attribute__((always_inline)) inline float arm_sqrt_f32(float x)

{

...

}

Regards,

Jing

magro732 · ‎12-21-2021

Thanks for your reply but I'm not having problems with the single precision version, it is the double version I have problems with.

Right now I have made a custom assembler function for double precision looking like:

.global arm_sqrt_f64
.section .text
.type arm_sqrt_f64,%function

arm_sqrt_f64:
.fnstart
vsqrt.f64 d0, d0
bx lr
.fnend

But this function cannot be inlined so I still don't get optimal performance.

Is there a way for me to define a double precision sqrt that is possible to inline?

jingpan · ‎12-21-2021

Hi @magro732 ,

I modified your code, it can be compiled without any problem.

Regards,

Jing

Ahlan · ‎11-24-2023

Hi,

We had the same problem trying to use double precision float with the inline assembler for ARM and your solution to use %P worked. So many thanks for your post. Where on Earth did you find out about %P. Try as I might I can't find this wonderful secret documented anywhere.

magro732 · ‎01-02-2022

Thanks! This works for me.

Do you know why the standard sqrt included from math.h isn't using this assembler instruction? The single precision version sqrtf is using the assembler instruction but not the double version.

jingpan · ‎01-03-2022

Hi @magro732 ,

Sorry I can't find related information.

Regards,

Jing

magro732 · ‎01-04-2022

Don't you NXP-people think this is a flaw in the library support? If I had not stumbled upon this I would just have used the math.h version of SQRT which is not using the assembler instruction. And the CMSIS-DSP library does not include a 64-bit version of SQRT either.

Should it be necessary to write your own inline-assembler to get full performance for 64-bit SQRT from the M7 processor?

jingpan · ‎01-04-2022

Hi @magro732 ,

Yes, I agree with you. CMSIS-DAP is released by ARM. But if ARM doesn't add this feature, we can make a patch. I'll escalate your suggestion.

Regards,

Jing