INC and DEC timings in direct addressing mode

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

INC and DEC timings in direct addressing mode

1,675 Views
fabio
Contributor IV
Hello,

Maybe this is a silly question but while researching on code optimization, I realized that the timings for the INC and DEC instructions when using the direct addressing mode are somewhat strange when compared to other addressing modes and instructions.

According to the HCS08 Reference Manual and several (all) device data sheets, the INC opr8a instruction executes in 5 BUSCLK cycles.

When looking on other instructions using DIR addressing mode we find slightly lower timings: ADC, ADD, AND, BIT, CMP, EOR, ORA, etc. All these instructions execute in 3 BUSCLK cycles when using DIR addressing mode.

Nevertheless, INC and DEC (and shifts too), when using DIR mode execute in 5 BUSCLK cycles.

In fact, it is possible to exchange the INC opr8a instruction with a two-instruction sequence such as:

LDA #1       ; 2 cycles
ADD opr8a  ; 3 cycles

Which runs in the same time!

My question is: why a two-instruction sequence runs in the same time as a single instruction, which is supposed to be faster?

Looking into the supposed micro-operations sequence, it is difficult to understand why INC oper8a needs 2 more cycles than ADD oper8a, as they use the hardware (ALU) in almost the same way.

Of course I am not neglecting the fact that INC needs to load one input of the ALU with 1 and the other one with the operand read from the memory, but that could be surely done in parallel with the operand fetch.

Anyone else felling there is some room for improvement? Maybe we could get a deeper explanation on why it is not possible to run INC oper8a in 3 BUSCLK cycles.

Maybe I (probably!) missed something ...

Best regards,
Labels (1)
0 Kudos
4 Replies

387 Views
peg
Senior Contributor IV
Hello Fabio,

Whilst I don't know the exact reason, it is because INC and DEC are two of the instructions that had a cycle added to them in the HC08 to S08 change and the others you quote remained the same.
For a clearer picture see the tables early in AN2717.

0 Kudos

387 Views
bigmac
Specialist III
Hello Fabio,
 
My guess is that these read-modify-write instructions have increased by one cycle over the previous HC908 because the HCS08 has fewer oscillator cycle subdivisions within each bus cycle, and I presume that the extra bus cycle was needed for internal processing.
 
The categories for each cycle are identified as rfwpp for the HCS08.  I might expect the first three cycles for the read-modify-write process, however I am not sure of what happens during the final two cycles that represent "program byte access".  I could not find similar categorisation for the HC908, for comparison.
 
A similar thing seems to apply to other read-modify-write instruction, for direct addressing mode, including the shift and rotate instructions, as you mentioned, plus CLR, COM, NEG, BSET and BCLR instructions.
 
Regards,
Mac
 


Message Edited by bigmac on 2008-09-17 02:24 PM
0 Kudos

387 Views
fabio
Contributor IV
Hello Guys,

Thank you for your time!

Daniel,

You are right! Actually I missed the store operation! I am working on a PIC18 project and I got used with the ADDWF x,F notation!

The ADD opr8a stores the result in A and I'll really need a store operation to place the result in my destination variable.

My point here was that most instructions using DIR addressing mode are faster than INC and DEC (and others as MAC also stated).

But considering that INC and DEC are R-M-W instructions and modify directly the destination address (without the need of a store operation) and that ADD, AND, etc. only modify the accumulator (A). That explains the need for additional BUSCLK cycles.

As I said, I really missed the store operation needed for the ADD comparison!

Peg,

I am aware of that AN, I also questioned FS (some time ago) about the increased cycles of some instructions (compared to the same ones on HC08). Their answer was that the faster clock speed and the two-byte instruction queue made it difficult to route some signals (within the CPU) as they were on the HC08. That ended up with a new design on the instruction decoder and on the micro-operation sequence.

Mac,

Yes, that makes sense. Also, as far as I know, the p cycles are actually fetch cycles and they are used to fill the two-byte instruction queue.

Thank you guys! For some reason(s) I missed the store operation and that ended up with a wrong interpretation on the operation flow of INC and the LDA/ADD/STA comparison.

Best regards,
0 Kudos

387 Views
CompilerGuru
NXP Employee
NXP Employee
Not sure why the INC for a S08 takes 5 cycles, for the HC08 it just takes 4. Therefore there must be some advantage (my guess silicon size) so that the newer S08 takes now 5 cycles.

>My question is: why a two-instruction sequence runs in the same time as a single instruction,
> which is supposed to be faster?

The "LDA #1;ADD opr8a" is not doing the same as the INC as read modify write instruction also contains a store of the result, so it's more like "LDA #1;ADD opr8a; STA opr8a" and that is obviously slower than an INC.

Daniel
0 Kudos