what is the difference between mulu.l and muls.l in the Coldfire V2?
Since the result is truncated, signed and unsigned multiplication should yield the same result.
Something I'm missing?
Obetz wrote:...Since the result is truncated, signed and unsigned multiplication should yield the same result.
That's an interesting observation. I didn't try to put together a mathematical proof, but a few test cases do suggest that this is the case.
Anyway, the existence of separate instructions is still justified if only for the sake of symmetry.
The difference probably lies in the status flag 'N'. For MULU, the result cannot be negative (in contrast to the description of the N-flag's valaue after the operation: "Set if result is negative; cleared otherwise"), so the flag should be 0 always.
For MULS the result can be negative in some calculations.
I didn't test this myself, but you could try the following value pairs:
0xFFFFFFFF * 0xFFFFFFFF = 0xFFFFFFFE00000001MULU: 0x00000001, N=0MULS: 0x00000001, N=00x80000000 * 0x00000001 = 0x0000000080000000MULU: 0x80000000, N=0MULS: 0x80000000, N=1
It gets interesting when 32-bit overflow comes into play, like this:
0x80000000 * 0x00000002 = 0x0000000100000000MULU: 0x00000000, N=0MULS: 0x00000000, N=1?
I'm unsure what the processor does in this case. If the N-flag simply represents bit 31 of the resulting 32-bit, N will become zero. If the N-flag is combined as the XOR of the bits 31 of the two 32-bit input values, N will become 1.
I would assume that the multiplier unit will consider all values to be unsigned, so the results will be identical in all cases. Creating the 2-s complement of all negative inputs (negating all bits and adding 1), and then potentially doing the same for the result simply is too expensive (time and transistors). I seem to remember that when in university we once have proven that doing so gives identical results in all applicable value ranges, but I'm not sure anymore.
Nice try, Johan, but frutiless
The CPU reference manual states: "N - Negative. Set if the most significant bit of the result is set; otherwise cleared.". That's also true for mul*.
Anybody knowing a difference?
I simply don't want to believe Freescale wasted an opcode.
Checking an on-line 68000 data sheet implies the difference is that the Signed one can set the overflow bit.
The Coldfire doesn't make this distinction. Neither MULU or MULS sets overflow.
Another reference for the 68000 states:
"Two multiply and divide instructions are available: signed (MULS and DIVS) for single-precision instructions and unsigned (MULU and DIVU) for multiple-precision instructions."
"The two versions are unsigned (MULU and DIVU) and signed (MULS and DIVS) instructions; these versions interpret their operands as one's-complement and two's-complement numbers, respectively."
Does that ring any bells for anyone?
How about this one?
"However, there are a few subtle cases where the ColdFire instruction is not exactly the same as its 680x0 counterpart. The most important of these is that multiply instructions (MULU and MULS) do not set the overflow bit. This means that a 680x0 code sequence which checks for overflow on multiply may assemble and run under ColdFire, but give incorrect results."
I think that nails it. The 68000 instructions were made so that code could make use of the overflow bit. This was left out in the Coldfire. but the instructions remain for easy code conversion.
Some architectures allow "short immediates" like the MOVQ or single-byte variables. Signed and unsigned would matter in this case due to the sign extension before the multiply. Doesn't apply in the Coldfire case.
It shouldn't take too long to code up a loop multiplying all 64k values by each other, signed and unsigned and comparing the results. Anyone want to try? How about the exhaustive 32bit by 32bit case?
Binary (opcode) compatibility for incompatible functions? Hmm... Could be.
Regarding "It shouldn't take too long to code up a loop multiplying all 64k values by each other": The 16x16->32 multiplication is out of question, in this case the difference between signed and unsigned is obvious. Only the truncated yields the same result, and it would take very, very long to check the 2E19 combinations.
Retrieving data ...