floating point characterization

Zap · ‎04-17-2009

Dear freescalers,

I was looking for documents that help to understand and characterize the floating point support for 8-bit microcontrollers of the family HC(S)08.

I found this nice one for family HC11: AN974 floating-point Package. I was wondering if anything similiar for HC(S)08 microcontrollers have been written.

I need it since I would like to evaluate the worst case when I'm using the floating point implementation of some algorithm. In the forum I've read that a floating point sum/multiplication can take some thousend time the execution time of an integer one, hovever I need a more accurate estimation.

Thank you very much

Piero

kef · ‎04-17-2009

There are two very slow floating point add/sub cases. 1) when one addend is 2^(mantissa_bits-1) times larger than another one 2) when you subtract very close numbers, so that result is 2^(mantissa_bits-1) times smaller than one of addends, but not zero. Case 1 wastes cycles denormalizing (shifting right) smaller addend. Case 2 wastes cycles denormalizing result of addition. For example single precission 1.0 + 0.0000001 should be almost the slowest case. 1.0 - 0.9999999 also should be very slow. 1.0 + 1.0 should be almost the fastest case. Adding zero or very small number (smaller more than 2^(mantissa_bits-1) times) should be also fast, could be even faster than 1+1.

Cycles wasted by FP mul should almost not depend on arguments. Zeros or overflow are special case, should be faster than usual.

Now keeping above in mind, simply use some hardware timer ticking at bus clock rate or prescaled bus clock. Reading timer counter before FP add/mul, after add/mul, and taking the difference of timer counter readings, you may measure wasted bus clock cycles and characterise not only FP add/mul, but also other your custom routines. Good luck.

View solution in original post

kef · ‎04-17-2009

There are two very slow floating point add/sub cases. 1) when one addend is 2^(mantissa_bits-1) times larger than another one 2) when you subtract very close numbers, so that result is 2^(mantissa_bits-1) times smaller than one of addends, but not zero. Case 1 wastes cycles denormalizing (shifting right) smaller addend. Case 2 wastes cycles denormalizing result of addition. For example single precission 1.0 + 0.0000001 should be almost the slowest case. 1.0 - 0.9999999 also should be very slow. 1.0 + 1.0 should be almost the fastest case. Adding zero or very small number (smaller more than 2^(mantissa_bits-1) times) should be also fast, could be even faster than 1+1.

Cycles wasted by FP mul should almost not depend on arguments. Zeros or overflow are special case, should be faster than usual.

Now keeping above in mind, simply use some hardware timer ticking at bus clock rate or prescaled bus clock. Reading timer counter before FP add/mul, after add/mul, and taking the difference of timer counter readings, you may measure wasted bus clock cycles and characterise not only FP add/mul, but also other your custom routines. Good luck.

Zap · ‎04-18-2009

Dear kef,

thank you very much. I was doing something like you suggested but couldn't find a worst case.

For my project, I am interested in the number of cycles needed to perform some operations and routines (so I can also consider the clock speed when doing my evaluations), thus I am testing with Full Chip Simulation (FCS) and check the number of cycles using a set of breackpoints.

Could you also tell me how it would be the worst case for division, square root and exp?

That would be very helpful too.

Thanks again

Piero

floating point characterization

floating point characterization

General