Howard Heid

Coldfire V1 runtime LongLongCF.C - slow & bug

Discussion created by Howard Heid on Oct 24, 2012
Latest reply on Nov 15, 2012 by TomE

Has anyone found a usable and faster runtime library or user includable code

generally available for the Coldfire V1?

 

Conversion of an old 08GP32 project to a Coldfire V1 (the MCF51JM128VLK)
has resulted in dissapointingly slow performance (the GP32, not to mention the
GT60, outperforms the V1).  The V1 runtime library is optimized for the smaller
memory, higher speed, Coldfires and is just way too inefficient for the V1s.
Searching the web found no other runtime library or user code available to
replace these originally written 68K libaries.

 

The simplest runtime module to rewrite was the LongLongCF.C as it is one third
asm CINT64 routines, one third C CINT64 division routines, and one third C
FP convert routines.  Especially as the compiler seems to predominatly do the
32-bit math using the LongLongCF.C CINT64 routines.  Also, the compiler allows
overloading of the LongLongCF.C routines with the user's just by having them in
the project files.

 

The result of replacing just the asm routines (none of the C routines) was a
33% improvement in the project's critically timed code section (760+ to 510+
uS).  Also, the __rt_rotr64() routine was discovered to be in error (the first
BRA.S instruction goes to the third loop label when it should have gone to the
fourth label).  (Searching the web indicates that it possibly occured as far
back as 1996 in the original 68K code.)

 

The new code is proprietary but the performance benefits are:

  extern asm ABI_SPEC short __rt_cmpu64( CInt64, CInt64 );
    Size  : 36                30
    Speed : 15,23; 22; 13,22  20; 18; 18
    Rating: 828               600

  extern asm ABI_SPEC CInt64* __rt_eor64( CInt64*, CInt64, CInt64 );
    Size  : 30                32
    Speed : 21                19
    Rating: 630               608

  extern asm ABI_SPEC CInt64* __rt_mul64( CInt64*, CInt64, CInt64 );
    Size  : 80                160
    Speed : 608               167
    Rating: 48640             26720

  extern asm ABI_SPEC CInt64* __rt_neg64( CInt64*, CInt64 );
    Size  : 26                24
    Speed : 17                15
    Rating: 442               360

  extern asm ABI_SPEC CInt64* __rt_rotl64( CInt64*, CInt64, short );
    Size  : 50                76
    Speed : 592               38
    Rating: 29600             2888

  extern asm ABI_SPEC CInt64* __rt_rotr64( CInt64*, CInt64, short );
    Size  : 68 (ERROR)        76
    Speed : 655 (ERROR)       38
    Rating: 44540 (ERROR)     2888

  extern asm ABI_SPEC CInt64* __rt_shl64( CInt64*, CInt64, short );
    Size  : 46                64
    Speed : 403               31
    Rating: 18538             1984

  extern asm ABI_SPEC CInt64* __rt_shrs64( CInt64*, CInt64, short );
    Size  : 52                66
    Speed : 592               32
    Rating: 30784             2112

  extern asm ABI_SPEC CInt64* __rt_shru64( CInt64*, CInt64, short );
    Size  : 52                64
    Speed : 592               31
    Rating: 30784             1984

  extern asm ABI_SPEC CInt64* __rt_sltoi64( CInt64*, signed long );
    Size  : 30                20
    Speed : 16                13
    Rating: 480               260

  extern asm ABI_SPEC CInt64* __rt_ultoi64( CInt64*, unsigned long );
    Size  : 16                16
    Speed : 12                11
    Rating: 192               176

Speed is the worst case of a routines possibly multiple timings.  Rating is the
product of size and speed for equally critical resources (as in Golf, the lower
the score the better).

\

The __rt_mul64() is NOT the best (it can be further improved by possibly 20
bytes and 10 cycles).

 

The above is presented as a challenge to get needed improved Coldfire V1
runtime libraries since no alternatives seem to be available.  (NO

IMPROVEMENTS SINCE 1996?)

Outcomes