in my application, 4 bytes / 2 byte is need as following,
dword a; /* a is type of unsigned long , 4 bytes */
word b; /* b is type of unsigned integer, 2 bytes */
word c;
c = a /b;
if I code it in C , c =a /b take more than 200 CPU cycles, it is too long for my application, so i consider implimenting it in assembly language, my CPU is HCS08, and only have one division instruction:
DIV /* A<--(H:A) /(X); H<-- remainder 6 CPU cycles , 16-bit by 8 bit divide instructions */
how to implement a 4 bytes variable divide 2 bytes variable in assembly fastly? thanks .
Hello CASEYKEVIN,
you won't get faster results from a 9S08 CPU, more likely much slower under certain conditions (numbers).
Look at the sources of the compiler's division routine.
Use a better CPU (Coldfire) or try to solve the task with a multiplication.
Oliver
Hello,
If you are currently achieving the 32/16 integer division in about 200 cycles, this would seem quite fast. Normally I would expect about 10 times this amount, even if written directly in assembly code. It will be interesting to see what the execution time of the two code snippets turns out to be.
The hardware divide for the HCS08 handles only an 8-bit divisor, and this is the limiting factor in its use - a 32/8 bit division would give fast code. Increasing the size of the divisor to 16 bits necessarily results in a slower software division process.
To achieve substantially faster calculation would require an alternative MCU containing hardware to support a 16-bit divisor - probably a 16-bit or 32-bit device.
Regards,
Mac
And here is an old routine of mine, originally written for the 68HC05.
It is a 24bit divided by 16bit, but easily expanded to 32bit divided by 16bit. It should also be optimized for the S08.
However, I'm not sure either of these two routines (Don's or mine) will do better than 200 cycles.
;;;; Divide 24 by 16;; Divides a 24 bit, unsigned integer by a 16 bit unsigned integer.; Enter with the dividend in the Psuedo-Accumulator and the divisor; in the X:A register. Exits with the 24 bit Quotient in the P.Acc.; and the 16 bit remainder in X:A.;Div24x16: STXA .MULT. ;put the divisor someplace safe ST24 TEMP ;move divedend to TEMP CLR24 ,, ;zero the low 24 bits ldx #24 ;number of times through the loop;; Main Loop.; Rotate dividend into A, one bit at a time,; and check if a subtract is needed.;; shift 32 bit pseudo-accumulator around left one position.;d24_1: asl TEMP+2 ;start with byte 0 rol TEMP+1 ;into byte 1 rol TEMP ;and into byte 2 rol ACCUM0 ;then around into byte 0 of p-acc rol ACCUM1 ;and finaly into byte 1 of p-acc; bcs d24_2 ;do a subtract if hi-bit went into carry CMP16 .MULT. ;is it worth a subtract? bcs d24_3 ;skip if no subtract needed here; ;leave a zero in lo-bit of quotient;; Do the subtract and put a 1 into lo-bit of quotient.;d24_2: SUB16 .MULT. ;subtract bset 0,TEMP+2 ;set new lowest bit in quotient;; decrement the loop count and continue if not zero;d24_3: decx ;decrement loop counter bne d24_1 ;and loop until all bits are done;; all bits are done. Quotient is in TEMP:TEMP+1:TEMP+2; and remainder is in ACCUM1:0. we move them.; lda TEMP ;get high byte of quotient sta ACCUM2 ;put it where it belongs ldx ACCUM1 ;put high byte of remainder in X lda TEMP+1 ;get mid byte of quotient sta ACCUM1 ;put it where it belongs lda ACCUM0 ;get low byte of remainder sta .MULT. ;put aside lda TEMP+2 ;get low byte of quotient sta ACCUM0 ;put where belongs lda .MULT. ;get last of remainder back rts ;return with answers;;
here is MC68xx uP code for 4byte /4 byte, I have used this for years.
You can edit it for 2 byte dividend.
You could probably make it faster by using stack indexing...