arm_rms_q15

emilien · ‎07-12-2012

Hi,

I'm in fixed-point trouble... I'm using the DSP on a twr-k60n512, here is what i wrote:

// define some values:

#define VECTOR_SIZE 40

float32_t inputU[VECTOR_SIZE] = {0.909663738089 , 0.169819655916 , 0.268918462228 , 0.405608980659 , 0.694552034332 , 0.213238658369 , 0.746090714595 , 0.146591804815 , 0.581591611035 , 0.00824751271313 , 0.815848907608 , 0.28001570888 , 0.459783555723 , 0.251290291806 , 0.677502043376 , 0.80253906042 , 0.817930529136 , 0.968525553535 , 0.428420247333 , 0.256734208324 , 0.62360547369 , 0.294399769527 , 0.56524282032 , 0.228892907349 , 0.00193085195605 , 0.906498765728 , 0.754109326089 , 0.347481953721 , 0.616133190588 , 0.7399958052 , 0.898959525768 , 0.849186403099 , 0.219555894784 , 0.635015634272 , 0.0762971785054 , 0.0306477419895 , 0.490306877931 , 0.954647274336 , 0.765811989471};
q15_t qinputU[VECTOR_SIZE];
arm_float_to_q15(inputU, qinputU, VECTOR_SIZE);

Then, I used the arm_mult_q15 and arm_mean_q15 functions and the results were correct, but when I try:

q15_t urms;

arm_rms_q15(inputU, VECTOR_SIZE, &urms);

float32_t display;

arm_q15_to_float(&urms, &display, 1);
printf("urms = %f\n", display);

It outputs rms = 0.277112, but it should be around 0.588939 (from my python script which I'm quite sure is correct).

What do you think ? Is there a problem with this arm_rms_q15 function or I'm doing something wrong ?

Thanks,

--

Emilien

admin · ‎07-13-2012

Ok. there is a bug in arm_rms_q15 that saturates the sum before dividing for the mean...

00107 /* Truncating and saturating the accumulator to 1.15 format */00108 sum = __SSAT((q31_t) (sum >> 15), 16);00109 00110 in1 = (q15_t) (sum / blockSize);00111 00112 /* Store the result in the destination */00113 arm_sqrt_q15(in1, pResult);

and

00136 /* Truncating and saturating the accumulator to 1.15 format */00137 sum = __SSAT((q31_t) (sum >> 15), 16);00138 00139 in = (q15_t) (sum / blockSize);00140 00141 /* Store the result in the destination */00142 arm_sqrt_q15(in, pResult);

should be changed to this:

    /* Truncating and saturating the accumulator to 1.15 format */    in = (q31_t) (sum >> 15);    in1 = __SSAT((in / blockSize), 16);    /* Store the result in the destination */    arm_sqrt_q15(in1, pResult);

Also, I noticed your vector is one value short of 40 values.

在原帖中查看解决方案

emilien · ‎07-12-2012

Well, if I do this instead:

    arm_mult_q15(inputU, inputU, temp, VECTOR_SIZE);
    q15_t ums;
    arm_mean_q15(temp, VECTOR_SIZE, &ums);
    arm_sqrt_q15(ums, &urms);

then it works... but I would prefer using the provided rms function, if somoene figures out what's wrong with it please let me know : )

admin · ‎07-12-2012

Try to align your inputU array as your working method doesn't need it, but your prefered method does since the rms function uses __SIMD32 followed by __SMLALD to speed things up.

#pragma data_alignment=8

q15_t inputU[VECTOR_SIZE];

emilien · ‎07-13-2012

Hi,

Thanks for the help, my arrays are 8 bytes aligned with or without the pragma instruction, but the arm_rms_q15 still gives an erroneous result : /, any ideas left?

Moreover I have another question about the arm_mult_q15 function: I can see it outputs a vector of q15_t values, what if I want a vector of q31_t values instead ? (Multiplying two q1.15 values should output a q2.30 value isnt'it ?).

admin · ‎07-13-2012

The mixed bag of Q formatted outputs is one of the reasons the CMSIS is a bit lacking. Most DSP cores will contain both a fixed point multiply that recovers the radix point (Q31 = Q15 x Q15 << 1) and an integer multiply which does not need to recover the radix. If you need Q31 results, you have to either convert your data to Q31 values (memory wasteful or cycles wasted moving data) or write you own version of a function which is rather easy, but takes some tweaks to get the performance required at times. The arm_mult_q15 extracts the Q15 value from a Q2.30 value with saturation, but the rms functions doesn't extract and saturate until preparing for the mean divide before the squareroot.

Aligning all the radix points is something that I've had to adjust to with the CMSIS. But, the FFT functions are probably the biggest hurdle I've ran across thus far since they skip 128 point optimized capability. Not having complex filters is another hugely missing feature.

If I get a chance, I'll put your data through a test and see if anything jumps out at me.

admin · ‎07-13-2012

Ok. there is a bug in arm_rms_q15 that saturates the sum before dividing for the mean...

00107 /* Truncating and saturating the accumulator to 1.15 format */00108 sum = __SSAT((q31_t) (sum >> 15), 16);00109 00110 in1 = (q15_t) (sum / blockSize);00111 00112 /* Store the result in the destination */00113 arm_sqrt_q15(in1, pResult);

and

00136 /* Truncating and saturating the accumulator to 1.15 format */00137 sum = __SSAT((q31_t) (sum >> 15), 16);00138 00139 in = (q15_t) (sum / blockSize);00140 00141 /* Store the result in the destination */00142 arm_sqrt_q15(in, pResult);

should be changed to this:

    /* Truncating and saturating the accumulator to 1.15 format */    in = (q31_t) (sum >> 15);    in1 = __SSAT((in / blockSize), 16);    /* Store the result in the destination */    arm_sqrt_q15(in1, pResult);

Also, I noticed your vector is one value short of 40 values.

emilien · ‎07-16-2012

Hi,

It works now. I also rewrote the mult and rms functions so that it can output a q1.31 result: for example I did this, which seems to output a correct result.

q31_t in;

in = __SSAT(((sum / blockSize) << 1), 32);
arm_sqrt_q31(in, pResult);

And the rms function is really faster than using mult, then mean, then sqrt : )

Many thanks for the bug fix.