Why is moveq better than move.l?

tupdegrove · ‎07-10-2009

(Did some searches but couldn't find answers to these basic questions.)

move.l #$0,d0

moveq #$0,d0

1. The Coldfire reference manual lists the execution times for both instructions as one cycle yet I know moveq is quicker. Why is that? I'm assuming it is because the move.l instruction length is 48-bits and moveq is only 16-bits. Would someone be kind enough to explain what is going on here?

2. What is the relation of a one cycle execution time to the number of core clocks? I thought it was one to one but a 16-bit instruction must take more than one clock to make it through the Instruction Fetch Pipeline (IFP) and Operand Execution Pipeline (OEP). Is the key here that the IFP time is not counted due to execution time assumption #1 (OEP loaded with opword & extension words)?

3. How many clocks does the 48-bit instruction "move.l #$0, d0" really take?

4. I would have expected the time difference between these two instructions to be listed in a table somewhere but couldn't find such a table in the programmer's or device's reference manuals. Did I miss it?

Tim
T3li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.Section1 {pageection1;} -->

admin · ‎07-15-2009

> move.l #$0,d0

> moveq #$0,d0

>

> 1. The Coldfire reference manual lists the execution times for both instructions as one cycle yet I

> know moveq is quicker. Why is that? I'm assuming it is because the move.l instruction length is

> 48-bits and moveq is only 16-bits. Would someone be kind enough to explain what is going on here?

Instruction "move.l #$0,d0" occupies two (16-bit) extension words in the program memory.

Instruction "moveq #$0,d0" doesn't occupies the extension words in the program memory.

Thus, the sustained delay in Instruction Fetch Pipeline (IFP) and (especially) in the program memory is shorter by two (16-bit) words for MOVEQ instruction. In the other words, MOVEQ instruction occupies three times less program memory space, than "move.l #$0,d0" instruction.

The sustained delay in Operand Execution Pipeline (OEP) is the same bor both the instructions.

> 2. What is the relation of a one cycle execution time to the number of core clocks? I thought it

> was one to one but a 16-bit instruction must take more than one clock to make it through the

> Instruction Fetch Pipeline (IFP) and Operand Execution Pipeline (OEP). Is the key here that the IFP

> time is not counted due to execution time assumption #1 (OEP loaded with opword & extension

> words)?

For any instruction, the regular path (system memory -> instruction and data pipilines -> execution unit -> system memory) takes tens or even hundreds of core clocks.

But, due to the pipelining, rate of the excecuted instructions can be as high as one instruction per core clock (in the best case).

Statistically, the instruction, which takes less program memory, is executed faster than the instruction, which takes more program memory. It occurs due to less overhead in the Instruction Fetch Pipeline (IFP) and (especially) in the program memory.

View solution in original post

admin · ‎07-15-2009