Hello Zenta,
Since you are referring to a sub-routine, I assume that you are using assembly code. Is this correct? In general, lookup table methods are usually speedier than algorithmic methods. You do not say whether the multiplication result needs to be in packed BCD format, or whether a binary value would suffice. You also do not indicate the minimum speed required by your application.
One possible method is to convert the two packed BCD quantities to binary, do the multiplication (using hardware multiply), and then to optionally convert the binary result back to packed BCD. Using this method, the assembly sub-routines that I tried required the following resources -
Binary result: 150 bytes and 302 cycles
Packed BCD result: Add 74 bytes and 235 cycles to the above figures.
If these figures are adequate for your application, I can post the code that I tested.
Regards,
Mac