Urgently needed - 40 bit / 16 bit in 8 bit MCU (HCS08)

rparekh · ‎08-28-2008

Hi,

I urgently need routine for 40 bit / 16 bit in 8 bit MCU. Fully unsigned operation.

Routine needs to be optimized for speed. Any - C or assembly - is ok.

CodeWarrior does not support a number more than 32 bit for 8 bit MCU.

Thanks in advance.

Regards,

RParekh

bigmac · ‎08-29-2008

Hello,

It is possible to consider a 40-bit value as a sequence of three word values (48 bits), perhaps expressed as a 3-element array (big endian format assumed). It is then possible to implement a "long division" process, in three stages, where the remainder from each stage is carried through to the next stage. However, the process will be slow. The following test code demonstrates this process.

// Global variables:

word invalue[3] = {
0x00FE, 0xDCBA, 0x9876
};
word outvalue[3];

/********************************************/
/* Divide function 48-bit/16-bit unsigned */

void divide48( word divisor)
{
   dword temp;

   if (divisor == 0) {
      outvalue[0] = 0xFFFF;
      outvalue[1] = 0xFFFF;
      outvalue[2] = 0xFFFF;

      return;
   }
   temp = invalue[0];
   outvalue[0] = temp / divisor;
   temp = (temp % divisor) * 0x10000 + invalue[1];
   outvalue[1] = temp / divisor;
   temp = (temp % divisor) * 0x10000 + invalue[2];
   outvalue[2] = temp / divisor;
}

The process may be sped up somewhat if the remainder is explicitly calculated, and avoiding use of the % operator, but is still relatively slow.

void divide48( word divisor)
{
dword temp;

   word rem;

   if (divisor == 0) {
      outvalue[0] = 0xFFFF;
      outvalue[1] = 0xFFFF;
      outvalue[2] = 0xFFFF;

      return;
   }
   temp = invalue[0];
   outvalue[0] = temp / divisor;
   rem = temp - (outvalue[0] * divisor);
   temp = (rem * 0x10000) + invalue[1];
   outvalue[1] = temp / divisor;
   rem = temp - (outvalue[1] * divisor);
   temp = (rem * 0x10000) + invalue[2];
   outvalue[2] = temp / divisor;

}

For the special case, with the dividend value limited to 40 bits, and with a minimum allowable divisor value of 256, the result should not exceed 32-bits. This might give the following simplification.

dword divide48( word divisor)
{
dword temp;

word rem;

   word outval[2];

   if (divisor < 256)

return 0xFFFFFFFF;

   temp = (invalue[0] * 0x10000) + invalue[1];
   outval[0] = temp / divisor;
   rem = temp - (outvalue[0] * divisor);
   temp = (rem * 0x10000) + invalue[2];
   outval[1] = temp / divisor;

return ((outval[0] * 0x10000) + outval[1]);

}

Regards,

Mac

rparekh · ‎08-29-2008

Hi,

Thanks a lot.

It is working.

But it is taking too much of time.

Any tip to optimize the same.

I need to finish this task in 200 usec @ 20 MHz bus speed in S08 MCU.

Regards,

RParekh

bigmac · ‎09-01-2008

Hello RP,

So you have a maximum of 4000 cycles to implement the division process. I have attempted a standard division algorithm (related to the "restoring division algorithm", without the need for restoral). The basic function code shown below seems to work correctly. The dividend and the result use the same global 5-byte array variable quotient. Another global word variable is used for the remainder associated with the calculation.

// Global variables:
byte quotient[5] = {
0xFE, 0xDC, 0xBA, 0x98, 0x76 // Dividend value
};
word remainder;

void divide40( word divisor)
{
byte i;

   if (divisor == 0) {
      quotient[0] = 0xFF;
      quotient[1] = 0xFF;
      quotient[2] = 0xFF;
      quotient[3] = 0xFF;
      quotient[4] = 0xFF;
      return;
   }
   remainder = 0;
   for (i = 0; i < 40; i++) {
      __asm {
         ldhx @quotient
         lsl 4,x
         rol 3,x
         rol 2,x
         rol 1,x
         rol ,x

         ldhx @remainder
         rol 1,x
         rol ,x
      }
      if (remainder > divisor) {
         remainder -= divisor;
         quotient[4] |= 1;
      }
   }
}

The number of cycles required for execution will depend on the number of 1's in the calculated quotient. For the HCS08, the formula is 2719 + 33*N cycles, where N is the number of 1's. This could be borderline to your limit of 4000 cycles.

If we now assume that the divisor value is never less than 256, so that the result can never exceed 32 bits length, it is possible to reduce the number of processing loops from 40 down to 32, with subsequent fewer cycles. The modified code follows.

void divide40( word divisor)
{
byte i;

   if (divisor < 256) {
      quotient[0] = 0xFF;
      quotient[1] = 0xFF;
      quotient[2] = 0xFF;
      quotient[3] = 0xFF;
      quotient[4] = 0xFF;
      return;
   }
   remainder = (word)quotient[0];
   quotient[0] = quotient[1];
   quotient[1] = quotient[2];
   quotient[2] = quotient[3];
   quotient[3] = quotient[4];
   quotient[4] = 0;

   for (i = 0; i < 32; i++) {
      __asm {
         ldhx @quotient
         lsl 4,x
         rol 3,x
         rol 2,x
         rol 1,x
         rol ,x

         ldhx @remainder
         rol 1,x
         rol ,x
      }
      if (remainder > divisor) {
         remainder -= divisor;
         quotient[4] |= 1;
      }
   }
}

For this version, the formula is 2217 + N*33 cycles, a saving of about 500 cycles.

Neither functions include the loading of the dividend value into the quotient array, that will require some additional cycles. With the second version, a few cycles could be saved if the remainder and quotient variables can be preloaded, to already incorporate the 8-bit shift.

A very brief description of the division algorithms is contained in the document attached.

Regards,

Mac

Urgently needed - 40 bit / 16 bit in 8 bit MCU (HCS08)

Urgently needed - 40 bit / 16 bit in 8 bit MCU (HCS08)

General