[HC08 CW] LDHX is converted to a series of ops

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

[HC08 CW] LDHX is converted to a series of ops

4,538 Views
ARG_Raiker
Contributor I
Since the HC08 cpu has a 16 by 8 div function and the CW compiler isnt smart enough to use it and instead uses _IDIVU_8, i was trying to use __ASM to load the 3 bytes, do a DIV and save the result, the problem is that when i disassemble the __ASM part it gives me this:
  154:    __asm{
  155:          LDHX    kmh_constant:1
  0019 ce0001   [4]             LDX   kmh_constant:1
  001c 89       [2]             PSHX 
  001d 8a       [2]             PULH 
  001e ce0002   [4]             LDX   kmh_constant:2
  156:          LDA     kmh_constant
  0021 c60000   [4]             LDA   kmh_constant
  157:          LDX     kmhtimer
  0024 ce0000   [4]             LDX   kmhtimer
  158:          STA     kmh
  0027 c70000   [4]             STA   kmh
  159:    }
why it likes to replace my LDHX with a full load of the word kmh_constant, i dont know, if i understand the asm sintaxis correctly, kmh_constant:1 refers to the highest order byte of the 2, so i'm loading the high byte in the high part of the X register, and the compiler tries to load a 3rd byte which i think would crash the uC because there is no 3rd byte to kmh_constant...
How do i force the compiler to use MY instruction and not what it thinks is correct?
Labels (1)
Tags (1)
0 Kudos
15 Replies

1,239 Views
CompilerGuru
NXP Employee
NXP Employee
The compiler must not use the DIV instruction for a 16/8 bit division because in the case of an 8 bit overflow
(when the 8 bit result overflows, note that the ANSI division is conceptually done with 16 bits and does therefore not overflow) the result the HW 16/8 bit div instruction are not what the ANSI C standard mandates.
If you check what the _IDIVU_8 function actually does, it basically uses a DIV, and in the case of an 8 bit overflow, it computes the correct answer on its own.

Daniel
0 Kudos

1,239 Views
bigmac
Specialist III
Hello,
 
I think that possible overflow can be prevented by treating the division process like a conventional "long division".  Perhaps the following assembly code might work for any non-zero divisor value.
 
__asm {
   ldx kmhtimer     // Divisor
   clrh
   lda kmh_constant : 0
   div
   sta kmh : 0
   lda kmh_constant : 1
   div
   sta kmh : 1
}
 
The remainder of each division is carried in H.  In this case, the result would be a 16-bit value.
 
Regards,
Mac
 
 
0 Kudos

1,239 Views
ARG_Raiker
Contributor I
I'm not concerned with overflows nor divide by 0 since kmh is Kilometers per hour, and its kinda hard to go over 256km/h and i check for timer=0 beforehand  :smileyvery-happy:
I didnt read the whole LDHX instruction, it actually loads the entire H:X register not only the H byte, but it should work anyways, because according to the .map file in my proyect, kmhtimer is stored @ 6A-6B. anyways, i think this is better for my purposes (took a piece of CW's code and some from here):
__ASM{
             LDX kmh_constant
             PSHX
             PULH
             LDX kmhtimer
             LDA kmh_constant:1
}
and compiles
  160:    __asm{
  161:               LDX kmh_constant
  0019 ce0000   [4]             LDX   kmh_constant
  162:               PSHX
  001c 89       [2]             PSHX 
  163:               PULH
  001d 8a       [2]             PULH 
  164:               LDX kmhtimer
  001e ce0000   [4]             LDX   kmhtimer
  165:               LDA kmh_constant:1
  0021 c60001   [4]             LDA   kmh_constant:1
  166:  }

just like i wanted, my only worry is: the label kmh_constant refers to the higher byte or the lower byte? being Big Endian it should refer to the lower since it comes first, in which case i'm doing it right, otherwise i could be swapping bytes (too lazy to run debug)
Thanks for all the help
PS: i forgot to add the STA kmh, done.


Message Edited by ARG_Raiker on 2008-06-11 02:43 PM
0 Kudos

1,239 Views
peg
Senior Contributor IV
Hello,

Yes it is big-endiain here, but this means that the big end comes first and an unqualified label will be located here.

0 Kudos

1,239 Views
ARG_Raiker
Contributor I
ah, wait, so if i say kmh_constant what am i refering to? and if i say kmh_constant:0 does that mean the lower or the higher byte?
just now i saw one of the other posts, i have to use :smileyshocked: problem solved (at least until i actually start testing this code on the board)



Message Edited by ARG_Raiker on 2008-06-11 08:20 PM

EDIT2: I forgot to put the DIV instruction in that code :sigh: missed the whole point, correct code follows:

__asm{
             LDX kmh_constant : 0
             PSHX
             PULH
             LDX kmhtimer

             LDA kmh_constant : 1
             DIV
             STA kmh

}



Message Edited by ARG_Raiker on 2008-06-11 08:35 PM
0 Kudos

1,239 Views
bigmac
Specialist III
Hello,
 
To further clarify big endian format, the high byte (of a 16-bit value) will be stored at a lower address than the low byte.  For inline assembler code, the address of a 16-bit word is identical to the address of its high byte.
 
Therefore, the variable kmh_constant  could refer to either the address of the high byte, or to the start address of the whole word, depending on the context of its use.  However, to make it clearer that the inline code was actually referring to only the high byte, kmh_constant:0  was optionally used, i.e. with an offset of zero.  The low byte of the word is referred to as kmh_constant:1 , with and offset of 1.
 
Within an .asm file (assembly only), the colon character is replaced by '+', which makes the expression more intuitive.
 
With respect to your actual project, I would assume that you are measuring the time between oddometer pulses, in order to derive the speed of a vehicle.  Using only 8-bit resolution for the time measurement will likely give unsatisfactory results.  If 16-bit resolution is used, the results should be satisfactory, but inline code using the DIV instruction will not be appropriate.
 
When calculating the speed from a period value, there will be a minimum speed value, below which no measurement can take place.  This is represented by the overflow value of the time measurement.  For 8-bit resolution, this value is 255.
 
Now, I assume that the maximum speed you wish to measure will be at least ten times the minimum measurement value.  If this is the case, the maximum speed would be represented by a measurement value of 25.  Let's assume this represents 150 kph, for a measurable speed down to 15 kph.
 
What speed does a measurement value of 26 represent?  I would estimate 150*25/26 = 144 kph, giving a speed resolution of 6 kph near maximum speed.  This will result in large jumps in the indicated speed, at higher speeds.
 
Regards,
Mac
 
0 Kudos

1,239 Views
ARG_Raiker
Contributor I
i use a timer with periodic interrupts at 0.25ms intervals, with a 16 bit variable, but when i have to calculate i check if the interval is less than 255 (255*0.25ms in time) and if so i only use the lower byte of the timer and implement it with the DIV. this is all caused because the normal 16 by 16 division took nearly 400 or 500 cicles in the debuger, and i'm an optimization freak. above 60km/h the timer will always measure less than 255 so it's a good gain to optimize the calculation this way. I have 1kph resolution +- 1Kph, without taking rounding errors into consideration, even then i doubt it gets to 2kph.
At first i wanted to do floating point operations, then i tried fixed point ( i didnt get around to implement it) and i finally got to this calculation which is a simplified version.
0 Kudos

1,239 Views
bigmac
Specialist III
Hello,
 
I don't think you are achieving very much with the use of two different calculation algorithms.  Even if the 16-bit method takes 500 cycles, for a 4 MHz bus frequency, this would amount to only 125 us.  For a vehicle speed of say 180 kph, each measurement period would be 20 ms.  So the use of the longer calculation period would seem a trivial issue.
 
A potentially more important issue would be the amount of time required to execute the periodic ISR, that occurs every 250 microseconds.  This may well occupy a much greater proportion of the time.
 
Regards,
Mac
 
0 Kudos

1,239 Views
ARG_Raiker
Contributor I
in the ISR i only have 2 lines, kmhtimer++ and rpmtimer++ both of which disassemble to LDHX, INC, BNE, INC, so i assume a maximum of say 20 cycles for the entire ISR every 0.25ms.
 what i dont know is how much time the context switch AND the processor expert hal take (one of the reasons as soon as i got this proyect working i'm rewriting the code to take PE out of my proyects).
I made a spreadsheet and calculated errors with different resolutions, and i decided on 0.25ms, once i have the proyect working i can try to slow it down, but it means greater error at high speeds...
0 Kudos

1,239 Views
bigmac
Specialist III
Hello,
 
If you have timed interrupts at 250 us intervals, your sampling  method is a little unclear - presumably the pulse width from the odometer exceeds this period - but you would still need to sample the input state to determine whether a timer increment should occur, or not.  Actually, the overhead to enter and exit an ISR will be at least 20 cycles.  This must be added to the "visible" code.
 
However, I might have thought a simpler method would be to use the input capture facility associated with the TIM module, to directly measure the period, and requiring only one interrupt per odometer pulse.  You could setup the prescaler so that the TIM input clock period was about 250 us, or a sub-multiple of this.
 
You might also consider using another channel of the TIM for output compare operation (software only mode), to indicate when the period measurement exceeds that for the minimum allowable measured speed.  This would require that the maximum interval between pulses be less that the TIM overflow period (as determined by the prescaler setting).
 
From the figures you have previously given, it would seem that the odometer outputs one pulse for every metre of travel.  Is this so?
 
Regards,
Mac
 


Message Edited by bigmac on 2008-06-18 01:28 AM
0 Kudos

1,239 Views
ARG_Raiker
Contributor I
The system works  like this:

KBI irq module, when a kmh irq is served, kmhtimer=0
when N irq's have passed, copy kmhtimer and set the "ready to calculate" flag.
N is the number of pulses per spin of the wheel, since i can buy 2,4,6,8 pulses per spin captors, this in turn simplifies the calculation since i only use the entire perimeter of the wheel and not a fraction. so, in essence i take the time of the wheel to complete 1 turn and do Distance / time to get my speed. (i'm sorry, i'm certain i'm using the wrong terms but english is not my 1st language).
kmh_calculate is only called before i update the screen so i'm not calculating all the time.
according to some figures i just did, at 7.3728Mhz i have 1845 ops per 0.25ms so wasting 50 isnt that critical i think (feel free to correct me, i'm about to leave and dont have time to verify)...
0 Kudos

1,239 Views
bigmac
Specialist III
Hello,
 
I assume that you are currently using the TIM module to generate the periodic interrupts.  However, the input capture mode of the TIM provides hardware that is specifically suited to period measurement, with only a single interrupt required per measurement cycle.  This would be the method that I would advocate, and should give simpler code than you currently have.
 
If you were to use a wheel sensor that gave 8 pulses per revolution, I would estimate that the distance covered between pulses might lie within the range 200-250 mm.  For the upper limit of this range, and assuming 10 kph is the minimum measured speed that is required, this would give 90 ms between successive pulses.  At 160 kph the period between successive pulses would be 5.6 ms.
 
With a bus frequency of 7.3728 MHz, and a prescale division of 16, the overflow period for the TIM would be 142.2 ms.  So input capture mode should work well for the periods you are attempting to measure.  If you were to reduce the number of pulses per revolution, the prescale value should be increased accordingly, but the measurement update period for very low speeds might then be considered too long.
 
The following code example should illustrate the requirements of the input capture method -
 
 
#define  KPH_CAL  829425L  // Assumes 250mm distance per pulse
                           // with TIM clock rate 460.8 kHz (prescale 16)
// Global variables:
word tcount;
word tprev;
word kph = 0;
byte flag = 0;
 
/**************************************************************************/
void main( void)
{
   /* Initialisation stuff */
 
   EnableInterrupts;
   for ( ; ; ) {           // Main polling loop
      __RESET_WATCHDOG();
      if (flag) {          // Test for new reading
         if (2 == flag)    // Overflow occurred
            kph = 0;
         else {            // Normal operation
            kph = (word)(KPH_CAL / tcount);  // Result is kph * 2
            kph = (kph + 1) / 2;             // Rounding adjustment
         }
         flag = 0;
 
         /* Update KPH display */
 
      }
   }  // Loop always
}
 
/**************************************************************************/
// Input capture operation for TIM channel 0 (wheel sensor signal)
 
interrupt void capture_ISR( void)
{
   tcount = TCH0 - tprev;  // Measured pulse period
   tprev = TCH0;
   TCH1 = TCH0 - 1;        // Next compare slightly less than O/F period
   flag = 1;               // Flag new reading
   TSC0_CH0F = 0;          // Clear interrupt flag for capture channel
}
 
/******************************************************************************/
// Output compare operation for TIM channel 1 (overflow detect)
 
interrupt void compare_ISR( void)
{
   flag = 2;               // Flag period overflow
   TSC1_CH1F = 0;          // Clear interrupt flag for compare channel
}
 
The KPH calculation may take about 0.5 ms to complete, but this is still much less than the minimum period between readings.  A rounding adjustment has been applied to the calculation.
 
It would be possible, and not too complex, to also apply a moving average algorithm to the readings, particularly for higher speeds, to smooth out fluctuations of the individual readings.  You might need to reduce the sample size at lower speeds, to improve the response.
 
Regards,
Mac
 


Message Edited by bigmac on 2008-06-18 06:58 PM
0 Kudos

1,239 Views
ARG_Raiker
Contributor I
Bigmac, i appreciate the code but the reason i used periodic interrupts was that i have to measure 3 (for now) input pulses and i have only 2 timers, RPM, Speed and Gas flow, if i see that it doesnt work or that it wastes too much cpu time with this method i will try to use the input compare, and i would have to change the prescaler for every wheel sensor i use (not now, but in the future i plan to give this to my car's club to build and install in their own cars)...

0 Kudos

1,239 Views
bigmac
Specialist III
Hello,
 
The compiler has actually done what you told it to do.  As Kef explained, the LDHX instruction for the HC908 only works with immediate or direct addressing modes.
 
The instruction  LDHX kmh_constant:1  suggests that the variable kmh_constant is longer than 16 bits.  If this is not the case, the instruction perhaps should be  LDHX kmh_constant.
 
However, to use the DIV instruction, the 8-bit divisor should be in X, and the 16-bit value in H:A.  Perhaps the following code might apply -
 
__asm {
   lda kmh_constant : 0
   psha
   pulh
   lda kmh_constant : 1
   ldx kmhtimer
   div
   sta kmh
}
 
The use of the DIV instruction is not appropriate if the length of the divisor is more than 8 bits.
 
Regards,
Mac
 


Message Edited by bigmac on 2008-06-11 02:53 PM
0 Kudos

1,239 Views
kef
Specialist I
Are you using old HC08 and not HCS08? Old HC08 can LDHX H:X pair in single instruction only from direct memory addressing range $00-$FF. That's why probably inline assembler replaces your single instruction with a series of instructions. Try compiling this snipped for HCS08 and I believe it will be compiled using single LDHX.
0 Kudos