Signed multiplication using MUL

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Signed multiplication using MUL

6,292 Views
JohanF
Contributor I
Hello out there,
I need a fast executing assembly code for signed multiplication (8bitx8bit) using the MUL instruction
/Johan 
Labels (1)
0 Kudos
29 Replies

487 Views
Curt
Contributor IV
Thanks to all for the interesting discussion.
 
Are there any tools commonly available to calculate raw cycles for a given asm code fragment and a given processor, or is this done by hand in each case?
 
And just out of curiosity:  what kind of performance does "C" deliver on this problem?
 
Regards,
Curt
 
0 Kudos

487 Views
bigmac
Specialist III
Hello Curt,
 
The method I used to determine bus cycles consumed was to run the code in the debugger under full chip simulation, and then single step through the code, whilst noting the cycle count.  But you still have to be aware of whether conditional branches are taken, or not, to ascertain the worst case.  By single stepping, you can immediately see the number of cycles for each instruction.
 
This method worked fine for the signed multiplication routines, that did not loop.  If the code makes a considerable number of loops in normal operation, you migh single step the first time through the loop, and then set a breakpoint to exit the loop.
 
Similar methods can also be used for C code.
 
Regards,
Mac
 


Message Edited by bigmac on 2007-06-19 02:15 AM
0 Kudos

487 Views
CompilerGuru
NXP Employee
NXP Employee
The debugger has a RESETCYCLES command which can be used to set the start cycle count to 0, that's what I did use.

Also the trace window is quite nice, open it ("open trace" in the command window), choose enable trace in the context menu and also go to Instruction mode (also context menu).
Then step or run through the function and you see where the cycles have been spent.

Alternatively, both the compiler and the decoder support to annotate the cycles in their listing file.
In case you are using the assembler, the decoder can be used. While this avoids that you have to lookup the instruction details, it does not actually add up the spent cycles.

0 Kudos

487 Views
rocco
Senior Contributor II
Hi, Johan:

You did not mention which processor you were using, but Mac mentioned that his cycle count was based on the S08. Could your cycle counts be based on the HC08? There is a small difference in cycles between the two versions for some instructions. I did not count cycles myself, as I bet you and Mac are both correct.

As for negating a sixteen-bit number, here is the code I have been using since the early eighties:
** Negate the 16 bit signed integer in the psuedo-accumulator.*NEG16 MACRO        COM    ACCUM1     ; 4        LDA    ACCUM0     ; 3        COMA              ; 1        ADD    #1         ; 2        STA    ACCUM0     ; 3        LDA    ACCUM1     ; 3        ADC    #0         ; 2        STA    ACCUM1     ; 3      ENDM     ;          =21*

This is a macro that negates the low sixteen bits of my 32-bit pseudo-accumulator, but it should be easy to adapt. It takes 21 cycles on an HC08 (I don't use any S08s yet).

If you find a method that is faster, please post it, as everything I do is math-intensive.
 

0 Kudos

487 Views
CompilerGuru
NXP Employee
NXP Employee
I think a "24bit += signed16bit" is also slower than a "24bit += unsigned16bit" or "24bit -= unsigned16bit" operation, so using a unsigned 16 bit multiplication helps even more than "just" for the multiplication.

Daniel
0 Kudos

487 Views
CompilerGuru
NXP Employee
NXP Employee
When manually counting the bytes in _BMULS I do get to 26 bytes too, however when the compiler compiles it does
perform some frame optimizations (replace SP accesses with H:X relative ones), so the compiler does really emit 23 bytes, I did count correctly :smileyhappy:.
I think the frame optimization is the only optimization done for HLI by default, it can be disabled with -onx.

BTW. The
TonyP version can save a byte by combining the inca and the coma to a nega :smileyhappy:

Daniel

0 Kudos

487 Views
CompilerGuru
NXP Employee
NXP Employee
Do you need a 8 or a 16 bit result?
Especially the 8x8=8 bit signed multiplication in simple, its the same as the unsigned one :smileyhappy:
For the 8x8=16 bit signed multiplication, the CW _BMULS runtime routine in lib\hc08c\src\rtshc08.c does just that, it first multiplies unsigned and then adapts the result if the operands have been negative.
And are both of your operands eventually negative, or is one known to be positive?
What do you need it for?

Daniel


Message Edited by CompilerGuru on 2007-06-14 11:21 PM
0 Kudos

487 Views
JohanF
Contributor I
Thanks,
 
I need 8X8bit into a 16bit result. I already know that one of the factors is positive. The other factor could be either positive or negative.
 
 
0 Kudos

487 Views
bigmac
Specialist III
Hello Johan, and welcome to the forum.
 
As mentioned by Daniel, an approach to 8-bit signed multiply, with 16-bit result, is to convert any negative input value to positive, multiply, then process the signs separately, and convert the result to a negative value, if required.
 
The following assembly routine appears to work - and assumes ACC and X contain the 8-bit signed values.  On exit, X:A should contain the signed result.
 
SMUL8:    AIS    #-1
          CLR    1,SP           ; Sign calculation on stack
          TSTA
          BPL    SM1            ; Branch if ACC is positive
          NEGA
          COM    1,SP
SM1:      TSTX
          BPL    SM2
          NEGX
          COM    1,SP
SM2:      MUL
          TST    1,SP           ; Test sign of result
          BPL    SM3            ; Exit if positive
          COMA
          COMX
          ADD    #1
          BCC    SM3
          INCX
SM3:      AIS    #1             ; Adjust stack pointer
          RTS
 
As shown, the routine will take between 43 and 65 cycles for a HCS08 device.  Using a temporary, zero page RAM register, in lieu of the stack, will save a few cycles.
 
Regards,
Mac
 
0 Kudos