Why is moveq better than move.l?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Why is moveq better than move.l?

Jump to solution
2,965 Views
tupdegrove
Contributor III

(Did some searches but couldn't find answers to these basic questions.)

 

move.l #$0,d0

moveq #$0,d0

 

1. The Coldfire reference manual lists the execution times for both instructions as one cycle yet I know moveq is quicker.  Why is that?  I'm assuming it is because the move.l instruction length is 48-bits and moveq is only 16-bits. Would someone be kind enough to explain what is going on here?

 

2. What is the relation of a one cycle execution time to the number of core clocks?  I thought it was one to one but a 16-bit instruction must take more than one clock to make it through the Instruction Fetch Pipeline (IFP) and Operand Execution Pipeline (OEP).  Is the key here that the IFP time is not counted due to execution time assumption #1 (OEP loaded with opword & extension words)?

 

3. How many clocks does the 48-bit instruction "move.l #$0, d0" really take?

 

4. I would have expected the time difference between these two instructions to be listed in a table somewhere but couldn't find such a table in the programmer's or device's reference manuals.  Did I miss it?

 

Tim
T3li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman";} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.Section1 {page:smileyfrustrated:ection1;} -->

Labels (1)
0 Kudos
1 Solution
818 Views
admin
Specialist II

> move.l #$0,d0

> moveq #$0,d0

>

> 1. The Coldfire reference manual lists the execution times for both instructions as one cycle yet I

> know moveq is quicker.  Why is that?  I'm assuming it is because the move.l instruction length is

> 48-bits and moveq is only 16-bits. Would someone be kind enough to explain what is going on here?

 

Instruction "move.l #$0,d0" occupies two (16-bit) extension words in the program memory.

Instruction "moveq #$0,d0" doesn't occupies the extension words in the program memory.

Thus, the sustained delay in Instruction Fetch Pipeline (IFP) and (especially) in the program memory is shorter by two (16-bit) words for MOVEQ instruction. In the other words, MOVEQ instruction occupies three times less program memory space, than "move.l #$0,d0" instruction.

The sustained delay in Operand Execution Pipeline (OEP) is the same bor both the instructions. 

 

 

> 2. What is the relation of a one cycle execution time to the number of core clocks?  I thought it

> was one to one but a 16-bit instruction must take more than one clock to make it through the

> Instruction Fetch Pipeline (IFP) and Operand Execution Pipeline (OEP).  Is the key here that the IFP

> time is not counted due to execution time assumption #1 (OEP loaded with opword & extension

> words)?

 

For any instruction, the regular path (system memory -> instruction and data pipilines -> execution unit -> system memory) takes tens or even hundreds of core clocks.

But, due to the pipelining, rate of the excecuted instructions can be as high as one instruction per core clock (in the best case).

Statistically, the instruction, which takes less program memory, is executed faster than the instruction, which takes more program memory. It occurs due to less overhead in the Instruction Fetch Pipeline (IFP) and (especially) in the program memory.

 

View solution in original post

0 Kudos
2 Replies
819 Views
admin
Specialist II

> move.l #$0,d0

> moveq #$0,d0

>

> 1. The Coldfire reference manual lists the execution times for both instructions as one cycle yet I

> know moveq is quicker.  Why is that?  I'm assuming it is because the move.l instruction length is

> 48-bits and moveq is only 16-bits. Would someone be kind enough to explain what is going on here?

 

Instruction "move.l #$0,d0" occupies two (16-bit) extension words in the program memory.

Instruction "moveq #$0,d0" doesn't occupies the extension words in the program memory.

Thus, the sustained delay in Instruction Fetch Pipeline (IFP) and (especially) in the program memory is shorter by two (16-bit) words for MOVEQ instruction. In the other words, MOVEQ instruction occupies three times less program memory space, than "move.l #$0,d0" instruction.

The sustained delay in Operand Execution Pipeline (OEP) is the same bor both the instructions. 

 

 

> 2. What is the relation of a one cycle execution time to the number of core clocks?  I thought it

> was one to one but a 16-bit instruction must take more than one clock to make it through the

> Instruction Fetch Pipeline (IFP) and Operand Execution Pipeline (OEP).  Is the key here that the IFP

> time is not counted due to execution time assumption #1 (OEP loaded with opword & extension

> words)?

 

For any instruction, the regular path (system memory -> instruction and data pipilines -> execution unit -> system memory) takes tens or even hundreds of core clocks.

But, due to the pipelining, rate of the excecuted instructions can be as high as one instruction per core clock (in the best case).

Statistically, the instruction, which takes less program memory, is executed faster than the instruction, which takes more program memory. It occurs due to less overhead in the Instruction Fetch Pipeline (IFP) and (especially) in the program memory.

 

0 Kudos
818 Views
tupdegrove
Contributor III

Thank-you Yevgenit for taking the time to answer these detailed technical questions.  I assumed shorter length instructions were better and that the IFP e was a big boost in performance but wanted to make sure I was understanding things correctly.

 

Tim

0 Kudos