> Also first 68K had 2 words depth instruction prefetch queue.
No, the SECOND one had the prefetch queue - the 68010. The FIRST one (68000, 68008, 68HC00, 68EC00, 68SEC00) had no prefetch queue (unless you're calling the 68000 the zeroth one :-).
The 68000 actually had a 16-bit ALU, matching its 16-bit data bus. This is obvious looking at the instruction timing, where things like the ADD instructions take an extra two clocks for 32-bit over 16-bit. With the exception of MUL, DIV and some of the more complicated addressing modes, the 68000 was very much bus limited, and each read or write took 4 clock cycles (plus any wait states) for each 16 bits of instruction and data. As an example, the worst case MOV instruction is a 32-bit move from memory to memory with the 32-bit addresses specified. It takes 7 16-bit reads and 2 16-bit writes and so 36 clock cycles to execute, 3.6 microseconds at a typical 10MHz. That's a lot, but MUL and DIVS take maximum times of 70 and 158 clocks, so were best avoided if possible.!
Here's the evolution from a paper on optimisation for the different models from here:
http://www.freescale.com/files/32bit/doc/reports_presentations/MC680X0OPTAPP.txt
The following table summarizes the characteristics of the different members
in the 68000 family:
PROC CACHE RADD MADD MUL INDEX BRA UACC HWFP
68000 0/0 6 18 40 18 10/6 no no
68020 256/0 2 6 28 9 6/4 yes 68881/2
68030 256/256 2 5 28 8 6/4 yes 68881/2
CPU32 0/0 2 9 26 12 8/4 no no
68040 4K/4K 1 1 16 3 2/3 yes yes
68060 8K/8K 1 1 2 1 0/1 yes yes
RAdd: Register to register 32 bit add (add.l d0,d1).
MAdd: Absolute long address to register add (add.l _mem,d1).
Mul: 16x16 multiplication (max. time) (mulu.w d0,d1).
Index: Indexed addressing mode (move.l 2(a0,d0),d1).
Bra: Byte conditional branch taken/not taken (bne.b label).
UAcc: Unaligned access allowed (move.l 0xffff0001,d1).
HWFP: Hardware floating point support.
Tom