Hello Ming Yee,
For the bit-bang version, my approach would probably be a little different than yours. I would tend to use the stack, rather than create RAM variables. Additionally, the shift process could be tackled as a single 16-bit word, rather than two separate bytes. This should make the code a little less complex.
The following code demonstrates this approach. I have also created macros for the hardware interface, to make the code a little easier to follow. It also means that, should the hardware configuration ever alter, only the macros need to be changed to suit the new configuration.
SYNC_LOW: macro
bclr SYNCDAC,PTBD
endm
SYNC_HIGH:macro
bset SYNCDAC,PTBD
endm
CLK_LOW: macro
bclr SCLDAC,PTBD
endm
CLK_HIGH: macro
bset SCLDAC,PTBD
endm
SDO_LOW: macro
bclr SDADAC,PTBD
endm
SDO_HIGH: macro
bset SDADAC,PTBD
endm
TXDAC: lda #16 ; Bit count
psha ; (3,sp)
pshx ; LS byte (2,sp)
pshh ; MS byte (1,sp)
CLK_LOW ; Initialise clock state
SYNC_LOW ; SYNC low to enable send
TXD1: CLK_LOW ;[5] Strobe current data
; Left-shift data word
lsl 2,sp ;[6] LS byte
rol 1,sp ;[6] MS byte
; Set serial data output bit state
bcs TXD2 ;[3]
SDO_LOW ;[5] SDO = 0
bra TXD3 ;[3] Branch always
TXD2: SDO_HIGH ;[5] SD0 = 1
TXD3:
CLK_HIGH ;[5] Return serial clock high
dbnz 3,sp,TXD1 ;[8] Loop for next bit
CLK_LOW ;[5] Strobe last bit
SYNC_HIGH ; SYNC high to end send
ais #3 ; Adjust stack pointer
rts
Note that the clock period is 38 cycles, so the communications will be very slow (260 kHz for a 10MHz bus frequency). The period allowed for the data to settle is 18 cycles.
Regards,
Mac