HCS08: Why is incrementing a 32 bit integer so slow?

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

HCS08: Why is incrementing a 32 bit integer so slow?

2,128件の閲覧回数
Superberti
Contributor I

Hi!

 

My target is an MC9S08QG8 an I'm wondering why incrementing a 32 bit integer is so expensive.

All measurements were done in the simulator.

 

For example:

 

...

UINT32 Counter=0;

UINT16 lc;

 

for (lc=;lc<30000;lc++)

  Counter++;  // this line will take about 156 CPU cycles!

 

So incrementing Counter will take about 156 CPU cycles which is MUCH more than incrementing a 16 bit integer. It's clear that incrementing a 32 bit integer is more effort than a 16 bit integer but the following is still much faster:

 

// this function takes 37 CPU cycles

void IncUINT32(UINT32 * li)
{
  ((UINT16 *)li)[1]++;
  if (!((UINT16 *)li)[1])
    ((UINT16 *)li)[0]++;
}

// This macro takes only 18 CPU cycles but it does not fit in every situation, e.g. for loops
#define INC2_LI(li) ((UINT16 *)&li)[1]++; if (!((UINT16 *)&li)[1]) ((UINT16 *)&li)[0]++;
 

So, any idea why the compiler produces so slow code with the ++ operator? In the debugger I see that the function "_LINC" is called in RTSHC08.c.

 

Bye,

ラベル(1)
0 件の賞賛
返信
6 返答(返信)

1,091件の閲覧回数
tonyp
Senior Contributor II
The function _LINC simply increments a 32-bit number on the stack.  Including the JSR/BSR it consumes 36 cycles (43 for the 9S08).  So, there must be more code that you include in your cycle measurement.

 

One possibility is you're measuring the complete code for a whole FOR loop iteration, which should be about right.

 

Another possibility is that as you step over instructions in the simulator some ISR is executed messing up your cycle count.

 

Try single-stepping in the simulator with assembly code (not source code) displayed so you can see what instructions are actually executed/counted.

 

0 件の賞賛
返信

1,091件の閲覧回数
tonyp
Senior Contributor II
On second look, LINC starts with ENTER_UNARY and ends with EXIT_UNARY which are setup and exit code, respectively.  This brings the total cycle count to 109 (118 for 9S08) for LINC.  Still a few more cycles are missing but if you look close enough (possibly right before the call to LINC) I'm sure you'll find them.  So, the problem is not with the actual 32-bit increment, but with all the overhead associated with setting up the stack, pointers, return address, and making the call to the LINC routine.
Message Edited by tonyp on 2010-01-13 02:18 PM
0 件の賞賛
返信

1,091件の閲覧回数
Superberti
Contributor I

Hi tonyp,

 

I step with F10 to the Counter++ line in the loop, reset the cycle counter and press once F10. After that the CPU cycle counter is on 156.

Nevertheless it is still not clear why my own C-Function "IncUINT32" is so much faster. It also has to set up the stack, save regs and so on.

 

Thanks and bye,

0 件の賞賛
返信

1,091件の閲覧回数
CompilerGuru
NXP Employee
NXP Employee

_LINC and IncUINT32 do not perform the same operation, 

_LINC is not incrementing a long in place, it is a general ADD 1 function which places the result on

the stack. Therefore it can be (and is) used for code like "long src,dest; dest= src+1;" or for "f(src+1)" as well.

For the "src++;" case the compiler generates "LDHX  @src; JSR   _LINC; JSR   _POP32;" which is not

especially big for a long operation, but I agree it is slow.

 

So I would suggest you file a service request to suggest the compiler should be extended to handle the

long increment/decrement in place as especially.

 

Daniel

 

 

0 件の賞賛
返信

1,091件の閲覧回数
TurboBob
Contributor IV

would ++Src  compile any differently?

 

 

0 件の賞賛
返信

1,091件の閲覧回数
CompilerGuru
NXP Employee
NXP Employee

> would ++Src  compile any differently?

No.

 

0 件の賞賛
返信