Kirk Humphries

GNU HC12 & floating point math + trig functions

Discussion created by Kirk Humphries Employee on Jan 28, 2006

This message contains an entire topic ported from a separate forum. The original message and all replies are in this single message. We have seeded this new forum with selected information that we expect will be of value to you as you search for answers to your questions.

 

Posted: Mon Oct 17, 2005  3:48 pm

 

Hi All,

I see that using floating point math in any code bloats the program a bit. I am assuming that attempting trig functions will only make that worse. I was wondering is there anyway to write the floating point math and trig functions in assembler and be able to include and call them from C source code compiled under the GNUHC12 Ver 3.01 compiler???

 

Thanks,

 


 

Posted: Mon Oct 17, 2005  4:28 pm

 

> I was wondering is there anyway to write the floating point math and

> trig functions in assembler and be able to include and call them from

> C source code compiled under the GNUHC12 Ver 3.01 compiler???

 

You think they take up too much space because they are written in something besides assembler? Clever people have been tweaking runtime library and math functions to optimize for speed, space, accuracy, somtimes combinations since the 50s. What's your complaint? You need more rom space? Need more execution speed? Willing to give up some accuracy?

 


 

Posted: Mon Oct 17, 2005  8:16 pm

 

> You think they take up too much space because they are written in

> something besides assembler? Clever people have been tweaking

> runtime library and math functions tooptimize for speed, space,

> accuracy, sometimes combinations since the 50s. What's your

> complaint? You need more rom space? Need more execution

> speed? Willing to give up some accuracy?

 

The problem might be what I see in GCC... The generic math functions were written for HC11, and still the same when I compile for 9S12 (well, at least it is backwards compatible). It needs a bit more work to make an optimized port for 9S12.

 


 

Posted: Mon Oct 17, 2005  8:35 pm

 

> The problem might be what I see in GCC... The generic math functions

> were written for HC11, and still the same when I compile for 9S12

> (well, at least it is backwards compatible). It needs a bit more work

> to make an optimized port for 9S12.

 

To explain better, the floating point routines are not optimized. You can find a download of a partially complete optimized floating point package:

 

http://groups.yahoo.com/group/gnu-m68hc11/files/

"float.s"

 


 

Posted: Mon Oct 17, 2005  6:33 pm

 

Cordic is very compact and fast, though fixed point. The hastle of adjusting the point position is not so onerous if the number of operations isn't too high and your range is known. You can find c examples on the net.

 

Regards,

 


 

Posted: Mon Oct 17, 2005  8:53 pm

 

> the floating point routines are not optimized.

 

I think imagecraft has optimized the fp routines for the hc12 to use the hw mult and div... dl the free 45 day eval and compile your prog with it.

 


 

Posted: Tue Oct 18, 2005  9:13 am

 

BTW: your quoting style makes ist very hard to distinguish the quoted text from your text. Couldn't you do it like other people?

 

> I think imagecraft has optimized the fp routines for the hc12 to use the hw

> mult and div... dl the free 45 day eval and compile your prog with it.

 

This sounds like you suggest to steal the FP routines from ICC. Are you kidding, or did I misunderstand?

 

Besides this, it would take a lot of work to port the ICC code.

 


 

Posted: Tue Oct 18, 2005  11:50 am

 

> I think imagecraft has optimized the fp routines for the hc12 to use the

> hw mult and div... dl the free 45 day eval and compile your prog with it.

 

OB:

 

This sounds like you suggest to steal the FP routines from ICC. Are you kidding, or did I misunderstand?

 

===================================

Steal? Come on. The guy is complaining that his free compiler has some 20 year old freeware fp routines that aren't optimized for the HC12... I said my HC12 icc compiler uses the hw mul and div instructions in the fp routines... you actually get some value when you buy this commercial product. I think the guy that wrote the original HC11 fp stuff consulted to

 

imagecraft to upgrade the fp routines... he got paid! Pay for results is such a complex concept to sw users.... everything has to be 1) free 2) state of the art Where does the idea of something for nothing come from anyway?

 


 

Posted: Tue Oct 18, 2005  6:15 pm

 

You see, it sounded like you were saying "compile your project in GCC with the libs taken from ICC". But now I see you were suggesting to switch from GCC to ICC. That sounds like it may be difficult.

 

It was not at all a suggestion by Oliver or anyone else :smileyhappy: However, I *have* suggested to make use of the Open Source by improving the source which is optimized but not complete.

 

What's missing in the new float.c is mostly that it does 'float' only and not 'double'. It's downloadable from the GCC-m68hc1x site as I posted.

 

You see the idea is that we each contribute a little (within our means), and we get paid in abundance with everyone else's work. In that way we all get paid much more for Open Source than we do from the old "intellectual property" system. Sure we still own the work we did on the project. It's just that we don't choose to all starve afterward.

 


 

Posted: Tue Oct 18, 2005  1:43 pm

 

I am looking for size and speed. I was just thinking about the size/speed/accuracy of Gordon Ds' stuff written in assembler for the hc11. I have used it in both the hc11 and ported it in assembler to the hc12. The total size of the the fp +, -, /, * funcitions and the trig stuff is about 2.2K of code space. The execution times that were orginally published in that code were adequate for the hc11 project I worked on. That was with a 2 MHz system clock. I did a simple check of compiling the GNU stuff with just the startup code. It compiles in about 500 bytes. I then started adding floating point math statements to see how they affected the final compile. When I finished including fp +, -, /, * statements it added about 9K of code to the baseline. So like I said I was just wondering if there is a way to write the fp math/ trig stuff in assembler an use it in a GNU HC12 compile. I had originally written some routines that take accelerometer measurements and convert them to an angle of inclination. It compiles in CW in about 5 K of code with fp math and trig functions being exercised.

 


 

Date: Tue Oct 18, 2005  6:43 pm

 

Why don't you go ahead now and take a look at the "float.s" to which I provided a link earlier, and you will see that ".s" means that it is indeed written in Asm.

 


 

Date: Tue Oct 18, 2005  2:27 pm

 

> code to the baseline. So like I said I was just wondering if there is a way

> to write the fp math/ trig stuff in assembler an use it in a GNU HC12

> compile.

 

Yes. I needed FP on an Atmel AVR project (w/ GCC) recently (not the transcendental functions, just basic arithmetics) and one can write assembly functions that replace the FP that comes with GCC. You can gain very significant speed (and size) improvements even if you do not cheat and round the proper IEEE way. If you write all the FP routines that GCC calls and link your library before the gcc library, then your routines will be linked in and the GCC ones will not be pulled from the the gcc lib - you will not get link errors, the elf object format allows doing that.

 

> I had originally written some routines that take acclerometer

> measurements and convert them to an angle of inclination. It compiles in CW

> in about 5 K of code with fp math and trig functions being exercised.

 

If you have a 2 axis accelerometer and you want to calculate the vector magnitude and angle you might be better off using fix point and CORDIC. It is fast and simple. A single CORDIC operation takes around twice as much time as a division with the same number of bits without a HW div instruction (i.e. with subtract and shift).

 

If you use a PWM output accelerometer, then with 7 CORDIC runs you can 1) calculate the duty cycle on both axis 2) compensate for gain and offset errors as well as for a possible tilt of the mounting, then 3) calculate a magnitude and angle and finally 4) scale the result to your favorite units (e.g. 0.01g and 0.1 degree per LSB).

 

I'm not sure what the economics of your project are, but several alternatives occur to me.

 

1) Bite the bullet, and use part with larger flash. If you fill up an MC9S12DP512, the 9k of floating point won't be much of the problem.

 

2) Consider other compilers. They cost more, but the reduced code size may be worth the cost in requiring less flash and saving time in coding around GCC peculiarities. Most vendors give a 15 to 30 day free trial, so once your code is more or less written, you can use the trial period to find out what the effect on code size is.

 

3) Doing the math in fixed point has also been suggested. My experience is that if the math is at all complicated, getting the scaling right takes a lot of software engineering and debugging time. This extra cost and schedule time makes using floating point in a larger or faster part look much more attractive.

 

If you do end up with fixed point math, I recommend Jack Crenshaw's "MATH Toolkit for REAL-TIME Programming" (ISBN 1-929629-09-5). It covers fixed point elementary and trig functions in fixed point thoroughly, and its well-written.

 


 

Date: Thu Oct 20, 2005  9:30 pm

 

 [...]

 

> 2) Consider other compilers. They cost more, but the reduced code size may

 

Ack.

 

[...]

 

> 3) Doing the math in fixed point has also been suggested. My experience is

 

but also fixed point math is tricky to implement. I observed noticeable improvements over the past years in the Cosmic libraries. Hard to beat, IMO.

 

[...]

 

> If you do end up with fixed point math, I recommend Jack Crenshaw's "MATH

> Toolkit for REAL-TIME Programming" (ISBN 1-929629-09-5). It covers fixed

> point elementary and trig functions in fixed point thoroughly, and its

> well-written.

 

I guess it doesn't contain higly optimized HC(S)12 code?

 


 

Date: Thu Oct 20, 2005  11:13 pm

 

Some more points about math:

 

When doing math routines for a compiler library, there is strong pressure to insure the best accuracy that the word length will allow, which typically costs some extra code and a lot of extra debugging of the library functions (sometimes by users).

 

When doing math for a particular embedded application, you may already know that the input data is only good to 6 or 10 bits, and you don't need to worry about getting the full precision.

 

Some more notes below:

 

>[...]

>

 

> > 2) Consider other compilers. They cost more, but the reduced code size

> > may

>

 

>Ack.

>

 

>[...]

>

 

> > 3) Doing the math in fixed point has also been suggested. My experience is

>

 

>but also fixed point math is tricky to implement.

 

I agree!

 

I'd say it a little differently. Its tricky to implement CORRECTLY.

 

Its deceptively easy to write the expressions as if the math was floating point. You can then spend some time inspecting the code and dealing with obvious scaling problems. Unfortunately the problems become much clearer when debugging, so you spend a lot more time debugging the code.

 

> I observed noticeable improvements over the past years in the Cosmic

> libraries. Hard to beat, IMO.

>[...]

>

 

> > If you do end up with fixed point math, I recommend Jack Crenshaw's "MATH

> > Toolkit for REAL-TIME Programming" (ISBN 1-929629-09-5). It covers fixed

> > point elementary and trig functions in fixed point thoroughly, and its

> > well-written.

>

 

>I guess it doesn't contain higly optimized HC(S)12 code?

 

It does contain a lot of integer C code that a moderately good compiler should execute much faster than a software floating point library written in assembly.

 

It also has plenty of discussion about errors, so its fairly easy to tailor the functions to be faster and just barely produce the needed accuracy.

 

The compilers probably will not handle the combinations of scaling and multiplying as cleverly as a experienced assembler programmer.

 

I'm not sure that I could justify using assembler instead of C for any project that hasn't DEMONSTRATED that its out of time or space, and a simple alternative like a faster or bigger part isn't acceptable.

 

C code is portable and pretty fast and relatively easy to debug.

 

Assembler is not portable, takes longer to write and has more bugs per program. How can you justify assembler without a proven specific need?

 

Even if you have a proven need now, the chances are the next year's part will solve your problems faster than you can debug the assembly code.

 


 

Date: Fri Oct 21, 2005  10:43 am

 

[...]

 

> > > 3) Doing the math in fixed point has also been suggested. My

> > experience is but also fixed point math is tricky to implement.

>

 

> I agree!

>

 

> I'd say it a little differently. Its tricky to implement CORRECTLY.

 

...correctly and fast and compact.

 

[...]

 

> > > If you do end up with fixed point math, I recommend Jack Crenshaw's

> > > "MATH Toolkit for REAL-TIME Programming" (ISBN 1-929629-09-5).

> > > It covers fixed point elementary and trig functions in fixed point thoroughly,

> > > and its well-written.

> >

> >I guess it doesn't contain higly optimized HC(S)12 code?

>

 

> It does contain a lot of integer C code that a moderately good compiler

> should execute much faster than a software floating point library written

> in assembly.

 

I disagree. Using fixed point math ends up in using 32 bit math in many cases (at least for intermediate results).

 

The HC(S)12 does 32 bit integer not faster than single precision float in most cases.

 

And it's tricky to make nontrivial 32 bit integer operations (div, square root) fast and compact. Maybe even more difficult than single precision float.

 

> The compilers probably will not handle the combinations of scaling and

> multiplying as cleverly as a experienced assembler programmer.

 

You can make the Cosmic compiler using emul, ediv if you multiply two 16 bit values with a 32 bit intermediate result divided by a 16 bit value. Thats great for scaling!

 

> I'm not sure that I could justify using assembler instead of C for any

> project that hasn't DEMONSTRATED that its out of time or space, and a

> simple alternative like a faster or bigger part isn't acceptable.

 

In most cases it's sufficient to do some small part in assembler (if even necessary).

 

IMO assembler is justified only in very specific cases as mass production of extremely cheap stuff or specific requirements.

 


 

Date: Fri Oct 21, 2005  1:40 pm

 

> > It does contain a lot of integer C code that a moderately good compiler

> > should execute much faster than a software floating point library written

> > in assembly.

>

 

> I disagree. Using fixed point math ends up in using 32 bit math in

> many cases (at least for intermediate results).

>

 

> The HC(S)12 does 32 bit integer not faster than single precision

> float in most cases.

 

That might be true for mul, but certainly not true for add/sub. 32 bit add is very simple and very fast in integer, but a float add is quite complex - you have to decompose the floats, work out the exponent difference, shift one of the mantissas into place, do a 24-bit add, normalise the result and pack it back to IEEE format. Furthermore, you have to round the result, which, if you want to do it properly (i.e. as per the standard), is not trivial. Also, the general purpose float routines have to cater for denormalised numbers, infinites, NaNs, signed zeroes and that all takes time. A lot more time than the two 16-bit adds, which the compiler will invariably use for longs.

 

A float multiply is pretty similar to a long mul. You gain a little on multiplying only 24x24 but you lose it all (and usually more) on the unpack/repack, NaN, inf and rounding business. Same goes for div.

 

> And it's tricky to make nontrivial 32 bit integer operations (div,

> square root) fast and compact. Maybe even more difficult than single

> precision float.

 

I tend to disagree. Float is compact because everything is in a library. In fixpoint you do all the scaling yourself which will result in larger code. On the other hand, if you design your algorithm well, you will only scale when you really have to and you will only use 32-bit intermediaries when you really need to. You can speed things up by choosing the multiplicative scaling factors such that the ones used for divisions are powers of two: mul is fast and cheap but div is expensive, so you shift instead.

 

If you need to compare things, integer compare is fast, float compare is (comparatively) slow. It is because even though IEEE float numbers, apart from the signum bit, form an ordered binary sequence, the fact that you have to deal with infinity and NaNs makes it a lot more complex than an integer compare. Just because two floats have the same binary image, they may not be equal (NaN == NaN is false even though both numbers are 0x7fc00000).

 

The integer square root of a 32-bit unsigned long can be calculated in 16 iterations of a shift-add core operation, for a float you need at least 26 rounds of pretty much the same thing, because in float you must generate all 24 mantissa bits, plus at least 2 more for the rounding (and you also need to work out if there would be any more non-zero bits if you continued the calculation to infinite precision).

 

Transcendental functions in float cost you a lot, because you have to cater for ranges, calculate the result in fairly high precision and so on. An integer CORDIC sin/cos takes only a few times more than a software division (the subtract/shift one) of the same number of bits.

 

Since a particular application is likely to have a pretty limited input range, it is very unlikely that a float cos() would ever need to range your input from a big number to mod 2*PI, but the library finction still will check if your number is indeed less than 2*PI. In your integer domain you can actually use your knowledge of the input range being limited by the sensors and omit any checking. In any case an integer modulus is uncomparably simpler than a properly done float modulus (the sin(x-2*PI*(int)(x/2*PI)) is *NOT* proper).

 

So, I do believe that integer (fixpoint) math has its role when the code memory is relatively cheap but actual calculation time is an issue. You will use more code space but you will still be faster, becasue you will spend a lot less time within loops or in the middle of the same few complex library functions called several times.

 

Admittedly, maintenance of such code is pain in the neck. If any of your sensors change, then your cleverly crafted scaling factors are out of the window and you can start playing the game from the start. With a float implementation its is only a matter of a #define.

 

One other major drawback of fixpoint is long term maintenance. If the code is not commented very well, including the detailed description of the whole algorithm down to the last bit with the relevant calculations of limits, scaling factors and all, then you have no chance to maintain the code. So, fixpoint things are viable only if whoever writes the routines is a good engineer (meaning that (s)he had to fix code that (s)he had written 5+ years before and thus learnt the importance of meaningful comments) or if the boss has a handy cattle-prod which he applies on the engineer every evening if he can't understand the daily algorithmic output on first read-through.

 

On the other hand, assuming a comment literate engineer, if the frequent change of environment is not a threat but runtime is, then I think it is worth to put the effort into a thoroughly analysed fixpoint version. It is possible that it turns out that floats are the better solution but one should not a priory discard fixpoint because "it is too hard" or "it is too complex".

 


 

Date: Fri Oct 21, 2005  1:23 pm

 

Since integer arithmetic of 32 bit and larger variables is discussed, I would like to point out that the new S12X family has nice improvements to make such fixed-point calculations of 32-bit or larger variables easier to implement than on HC12 and HCS12 parts.

 

On the S12X a 32 bit addition is made of 2 instructions:

 

ADDD #$32BIT_ADD_NUMBER_LSB

ADEX #$32BIT_ADD_NUMBER_MSB

 

The equivalent computation on the HC12 and HCS12 is more complex and requires moving the intermediate LSB 16-bit result out of register D, to use D again for the MSB calculation. Also add with carry exists only in 8 bit data instructions on HC12 and HCS12 devices (the ADCA and ADCB instructions) but not for 16-bit data.

 

The case is similar for 32-bit or larger subtractions. The S12X is again more efficient and faster.

 

Another very-nice improvement of the S12X is that 32 bit or larger compares are also much easier than on HC12 and HCS12. This is using the new CPED, CPEX, CPEY and CPES instructions. A 32 bit compare with a conditional branch then becomes:

 

CPD #32BIT_COMPARE_NUMBER_LSB

CPEX #32BIT_COMPARE_NUMBER_MSB

Bxx (branch conditional, like if 0, if larger, smaller, etc, etc)

 


 

Date: Fri Oct 21, 2005  4:11 pm

 

Do you know which compilers will use these new S12X 32 bit add and cp instructions without me having to insert them in asm?

 


 

Date: Fri Oct 21, 2005  4:31 pm

 

I don't know if and which C compilers use the new S12X arithmetic commands to form more efficient 32 bit match. Haven't tried writing 32 bit math in C for the S12X yet. Only in assembler so far.

 

I invite Cosmic and Metrowerks to comment on the subject, and enlighten us if they like.

 

I think other compiler don't support the S12X (yet?).

 


 

Date: Sat Oct 22, 2005  1:37 am

 

We will support S12X when we upgrade ICC12 to V7 technology. No time frame yet, but possibly the first few months of 2006.

 


 

Date: Thu Oct 27, 2005  3:03 pm

 

CodeWarrior for HCS12(X) v4.0 and V4.1 have been released mainly to provide a new compiler and new library especially upgraded to take advantage of S12X core extended instruction set. Anyone having one of these releases can get more details when searching for occurrences of the "__HCS12X__" define in the library. This define is also set when compiling with the "-CpuHCS12X" option (automatically set by the project wizard when creating a project for an S12X part), this option informing the compiler backend to generate code vs. the S12X instruction set and instructions "capability".

Outcomes