Hi Don,
Assuming a general routine with all of the 16-bit variables in page zero and not in registers, it could most likely be this:
;
; Allocate the variables in the zero page.
;
bsct
;
Variable1: ds.w 1 ;minuend
;
Variable2: ds.w 1 ;subtrahend
;
Variable3: ds.w 1 ;difference
;
;
; Code section
;
psct
;
lda Variable1+1 ;get low byte of minuend
sub Variable2+1 ;subtract low byte of subtrahend
sta Variable3+1 ;store low byte of difference
;
lda Variable1 ;get high byte of minuend
sbc Variable2 ;subtract high byte of subtrahend and carry
sta Variable3 ;store high byte of difference
;
Each instruction is 3 cycles, for a total of 18. With a 40mHz clock, it would execute in under a microsecond. It could be optimized for special cases, such as when the minuend or subtrahend is a constant, or a variable lives in a register.
I never write out math code like this, as I have a macro library to generate the code. PM me if you would like to see it. It does mixes of 8, 16, 24 and 32 bit numbers, based on a 32-bit psuedo-accumulator.