I looked over all of my code going back 30 years, and it seems I have always used a pseudo-accumulator on 8-bit micros, unfortunately. This routine seems to fit the best, even though it's about 15 years old (early HC08). If you replace the early references of the pseudo-accumulator, the ones that reference the multiplicand, with your constant (as immediate operands), and then replace the remaining references with the location of your result, you should have what you need without needing a pseudo-accumulator and with using only one temporary byte on the stack. Between this and Mac's code, you should be able to put something together.
;
; The 32 bit pseudo-accumulator
;
ACCUM3: ds.b 1 ;Most significant byte
ACCUM2: ds.b 1
ACCUM1: ds.b 1
ACCUM0: ds.b 1 ;Least significant byte
;
;
; Multiply an 16 bit, unsigned integer in the pseudo-accumulator
; (multiplicand) by an 16 bit unsigned integer in X:A (multiplier).
; Exits with an 32 bit, unsigned integer product in the psuedo-accumulator.
; Uses one byte of stack space for temporary storage.
;
M16x16: PHSA ;don't loose the low 8 bits of multiplier
; and reserve a byte on the stack
STX ACCUM2 ;or the high 8 bits of multiplier either
LDX ACCUM0 ;get low byte of multiplicand into X
MUL ;multiply lo multiplier with lo-byte multiplicand
STX ACCUM3 ;temporary store mid-lo-byte of partial product
LDX ACCUM0 ;get low byte of multiplicand into X, last time
STA ACCUM0 ;and store lo-byte of product in Pseudo-accumulator
LDA ACCUM2 ;get high byte of multiplier
MUL ;multiply high multiplier with lo multiplicand
ADD ACCUM3 ;add previous mid-lo part.prod to new mid-lo part.prod
STA ACCUM3 ;and replace partial product temporarily
TXA ;put mid-hi partial product in A
ADC #0 ;put carry from previous ADD in
TAX ;put mid-hi with carry back in X
LDA 1,SP ;get the low byte of multiplier again, last time
STX 1,SP ;put mid-hi partial product aside
LDX ACCUM1 ;get the high byte of multiplicand
MUL ;multiply low byte multiplier with high byte multiplicand
ADD ACCUM3 ;add previous mid-lo partial product to last mid-lo piece
STA ACCUM3 ;mid-lo is now complete, but misplaced
TXA ;get latest mid-hi partial product
ADC 1,SP ;add carry and previous mid-hi part
STA 1,SP ;put mid-hi aside again
LDX ACCUM1 ;get high byte of multiplicand, last time
LDA ACCUM2 ;get high byte of multiplier, last time
MUL ;multiply high byte with high byte
ADD 1,SP ;add previous mid-hi byte to new mid-hi byte
STA ACCUM2 ;store where mid-hi is supposed to be
LDA ACCUM3 ;get complete but misplaced mid-lo byte
STA ACCUM1 ;and place it correctly
TXA ;get highest byte
ADC #0 ;add any carry from previous add
STA ACCUM3 ;and store to make things complete
PULA ;clean the stack
RTS ;and return with 32 bits of product
;