MC68HC908QY4 Assembly Help Please

Rickrandom · ‎06-23-2007

A couple of years ago, I was involved at work in a project using the MC68HC908QY4 8-bit microcontroller. It was a very simple project, little more than a timing circuit, with the code all written in Assembly, but I can see another use for the same circuit with similar code, but using more of the device's functionality. My aim is to at least create a prototype using a spare existing unit and then present it to my colleagues to prove the concept.

I have to admit my lack of much experience or education on microcontrollers or the use of Assembly.

Some areas I think I have now got to grips with, such as the commands to get the ADC to sample, but there are several areas I am unsure how to do. Is there a tutorial on the techniques to do the basics, or a recommended book? The sorts of things I am unsure of are:

I want to average several ADC readings, but how do I add 8-bit numbers together, when the result will probably be more than 8 bits? Is there a standard technique for this in assembly? Do I have to look at the carry bit and "manually" increase a bit of some other register each time? I think I understand that I can use the H:X register to manipulate a 16-bit number, but I can only see the AIX command, to add an immediate value, not a command to add another variable such as an ADC reading.

If I take say 8 readings, and have managed to add them together, how can I right-shift the total to divide by 8? I can't see how I right-shift a word across a byte boundary, I can only see how to right-shift a byte.

When I take an ADC reading, which takes several clock cycles, I understand that I can do other work, e.g update my total using the most recent value, and then use an interrupt to know that the next value is ready. Is there an example of this technique that I could try to follow.

Because the previous project was in Assembly, and all the documentation supports it, I would prefer to stick with Assembly for this new project.

If I'm completely out of order expecting to do 16-bit work with an 8-bit device or expecting answers to these sort of things, I apologise, but I'd appreciate being steered in a useful direction.

I believe that this microcontroller is soon to be obsolete (although I may be wrong) but because the hardware exists in the current design, I would like to prove (or disprove) the idea with what we have. In the longer run, a guess we'd pick a newer device.

Any help would be greatly appreciated.

Rickrandom · ‎06-25-2007

Using the algorithm

filtered value = A1 * previous filtered value + B0 * new unfiltered value

with A1<1, B0<1, A1+B0=1,

rearranging gives

filtered value = previous filtered value + B0 * (previous filtered value - new unfiltered value)

i.e. you add on a proportion (B0) of the difference. (This might be a little easier to implement in Assembly with B0 as a fraction of 256.)

When using 8-bit ADC resolution, the multiplication gives a 16-bit result, and the useful part is the upper byte. If B0 is say 16, then any difference of 15 or less will not change the filtered value because the upper byte will be zero. The "insensitivity" will be 256/B0, and it will not converge once it is that close.

I guess the programmer needs to consider this, and if low values are likely and important then perhaps always add 1 (even if the multiplication result upper byte is zero) for new unfiltered > previous, subtract 1 for new unfiltered < previous, although this could converge too quickly, or do some trick such as scaling the hardware gain and/or offset to avoid small values, or subtracting the value from 255, so always working at the top end of the scale.

Going back to the original reason for my post, I've now got a spare unit to make my proof of concept, so I should get on with the detailed coding in a couople of days - expect more questions.

Rickrandom · ‎06-25-2007

So much to learn...

Now I can see how to do the right-shift and carry, for cases where that is the best method, and how (and when) to do a divide.

From simulating it in a spreadsheet, the filter may not respond for very low input values when using integer maths. How low will depend on the two coefficients and the resolution, so I guess any limitations should be calculated case by case. I'll try to think if there is a general calculation.

The filter will also be "slow" to start if the mean value is initilised at zero, so perhaps it should be initialised at the same as the first unfiltered value?

Rocco - sorry, I hadn't realised your filter was the same principle as what I suggested, as I hadn't got as far as understandng that part of your code - I read slowly!

Again thanks everyone for the interest and inputs.

bigmac · ‎06-25-2007

Hello Rick,

With the simple methods (per my sample code), the resolution of the result should be the same as the resolution of the sample.

With the rolling average approach, since you are simulating an analog filter, there will be a "rise time" for the result. If the initial value is likely to be other than zero, you can initialise the result (MEAN) to the expected value, or take a single ADC reading during initialisation, and make this the initial value. This should speed up the settling process.

If you need to obtain a "good" result over the first eight (or 16) samples, perhaps the simple "average of N samples" would be more suitable for this application, than the rolling average.

Regards,

Mac

peg · ‎06-25-2007

Hi Mac,

This "seeding" of the average based on the first non-zero sample was actually my first thought before, but I needed to speed the step response from non-zero levels as well which is why I changed to true rolling average (or last n samples) from the -mean+sample method.

bigmac · ‎06-25-2007

Hello Rick and Peg,

The alternative approach of averaging the last N samples, actually appears more complex to implement than the "digital filter". Firstly, the method would require considerably more RAM because of the need to maintain a circular buffer for each reading, and a pointer to the current location. This will probably restrict practical use to 8 or 16 samples in many instances (particularly for devices such as the 'QY4). Also needed would be a SUM register and a MEAN register for the result.

To provide a further example of assembly programming, the following untested code snippet is what I think might be involved to implement the process -

; RAM usage:

MEAN ds 1 ; Result byte

SUM ds 2 ; Word value

PNTR ds 1 ; Pointer to oldest reading in buffer

BUF ds 8 ; Circular buffer - 8 samples assumed

CLRA

PSHA ; Initial MS byte

LDX PNTR

LDA ADRL ; Latest sample

PSHA ; For later use

SUB BUF,X ; Subtract oldest reading

BCC SKIP1 ; Skip next if difference positive

DEC 2,S ; Sign extend MS byte

SKIP1: PSHA

LDA 2,S ; Latest sample

STA BUF,X ; Replaces oldest sample

; Increment pointer

LDA PNTR

INCA

AND #$07 ; Ensure range 0-7

STA PNTR ; Updated value

PULA ; Difference LS byte

ADD SUM+1

STA SUM+1

PSHA

LDA SUM

ADC 3,SP ; Difference MS byte

STA SUM

PSHA

PULH ; H = MS byte

PULA ; ACC = LS byte

AIS #2 ; Adjust stack pointer

DIV

STA MEAN

I hope my stack pointer calculations are correct. Of course, the first 8 results would need to be ignored, until the buffer is filled.

Regards,

Mac

peg · ‎06-25-2007

Hi Mac,

Yeah, I didn't do it this way because it was easier to implement. It looks like it works to me BTW.

I did it quite differently though. When I revisit this I will have to improve on what I have done as it only just worked and the next version will need to be better as it is faster.

bigmac · ‎06-25-2007

Oops -

I left out the instruction LDX #8 immediately prior to the DIV instruction.

In this case we need to divide by 8.

Regards,

Mac

Rickrandom · ‎06-24-2007

Thanks again for all the inputs, and it's nice to see a friendly debate!

I now see how to do the accumulation to more than 8 bits.

I suppose I was rather "leading the witness" with my question about right-shifting to do the divide. I had expected there was a way to do as many right-shifts as wanted in one operation, but that was perhaps overly optimistic.

I had been lead to believe that division was slow at this level, but looking at the operations available, it seems to me that I may as well use the DIV in this case, because if I take 8 samples, then I need to do 3 LSRs and 3 RORs, each of which is 4 cycles, so a total of 24 cycles, whereas I think (please tell me if this won't work):

LDHX accumulated total (2 bytes)
LDA lower byte of accumulated total
LDX number of samples (so number to divide by)
DIV

then this will be 4+3+3+7 = 17 cycles.

(I've based the above on the data sheet which describes DIV as A<-(H:A)/(X) but I may have misunderstood.)

This also allows more flexibility of the number of samples, e.g. I'm not restricted to 2, 4, 8, 16, ... as I would be if I was right-shifting.

In terms of dither, I think that in my real world application of reading sensors, they will be changing every sample, so dither won't be an issue. I am happy enough to get an average of however many samples.

My device is battery-operated, and away from the mains, so I don't think mains hum will be an issue.

Just to try to add something to the filter discussion, I have seen use of the following algorithm:

Constant A1 = exp(-sample time/filter time constant)
Constant B0 = 1 - A1

filtered value = A1 * previous filtered value + B0 * new unfiltered value

which is apparently from "Introduction to Dynamics and Control", Power and Simpson, McGraw Hill.

For interest, is this algorithm any good, or is it junk? Would this be implemented in Assembly by scaling A1 and B0 to a proportion of 256, then multiplying and adding, then just taking the upper byte?

I'll carry on trying to make sense of all that everyone has offered.

peg · ‎06-24-2007

Hi Rick,

My post, rather than being the best code to do the job, was an attempt to combine a tutorial with some code to produce the result.

Actually the divide is even more efficient than you indicate.

To convert my example:

LDHX ACCUM2 ;4

TXA ;1

LDX #8 ;2

DIV ;7

----------------------------

14 cycles

The result is in H:A and the divisor remains untouched in X for further divides if required.

In order to duplicate the shift method (if required) you need to put the result back into RAM

STA ACCUM1 ;3

Hi Rocco, not sure about that rating, maybe its a six (the register rolled over). I fixed it the best I can.

rocco · ‎06-24-2007

Hi Rick:

You are absolutely correct with respect to the divide instruction. Depending on the number of shifts you wish to make and the number of bytes in the value, the DIV instruction is often faster and certainly more flexible. But keep in mind that it only gives an 8-bit quotient, so it would not work if you used the 10-bit mode of the ADC.

I often use the ADC to read potentiometers in user-interfaces, most often as speed controls. So response time is not very important, but dither is unacceptable.

That filter that you describe is the one that I have implemented in "LOGfilter". Note that your constant coefficients, A1 and B0, are both fractions, and that A1+B0=1. Fractions are not very suitable for an 8-bit microcontroller (but are perfect for a DSP).

Just like you suggested, my implementation scales by a power of two, and then uses only the high byte for the result. But instead of doing any multiplying, I do a subtract to get the difference between the previous filtered value and the new value. I then scale that difference and add it into the integer+fractional result. If you do the algebra, you will find that the algorithm's Multiply/Multiply/Add method is mathematically identical to my Subtract/Shift/Add method, with the restriction that the coefficients need to be fractional powers of two (B0 and A1 being 1/2 & 1/2, 1/4 & 3/4, 1/8 & 7/8, 1/16 & 15/16, . . .to 1/256 & 255/256)

-----------
Hmmm . . . obviously, someone didn't like my previous post. I wish the would say why . . .

Message Edited by rocco on 2007-06-24 02:59 PM

bigmac · ‎06-25-2007

Hello,

I have previously implemented a rolling average process (in this case over 16 samples), and the method does seem to be similar to Rocco's LOGfilter routine - but perhaps viewed from a slightly different perspective. Based on a shift method, I implemented the formula -

MEAN = ((MEAN << IVAL) - MEAN + SAMPLE) >> IVAL

when IVAL = 4, this would represent 16 samples.

To use multiplication and division, and assuming 16 samples, the above formula would become -

MEAN = ((MEAN * 15) + SAMPLE) / 16

Both algorithms imply discardiing the fraction/remainder following the shift right or division. This seems to differ from Rocco's algorithm, where the fraction value is retained between samples. During simulation of my code (and I also set up a spreadsheet for this purpose), I could see only a small difference in the shape of the step response curve, for either alternative, so I elected to restrict MEAN to an 8-bit value. Can anyone throw light on the possible effects of one or other of these alternatives?

As Rocco has already said, the response is closely related to a first order low-pass filter, and for the 16-sample case, the equivalent time constant would be the period of about 15 samples. I take exception that the use of 10-bit samples from the ADC would preclude the use of the DIV instruction, since the divisor is unlikely to exceed the value 256 (the value 256 is a special case handled differently, as above). For the 16-sample case, the result of the multiplication and addition will not exceed a 16-bit value, even with a 10-bit reading, so the division should remain straightforward.

For the purists out there, I have used the word "mean" to represent the same as the word "average" - the former has fewer characters.

Regards,

Mac

Message Edited by bigmac on 2007-06-25 02:58 PM

peg · ‎06-25-2007

Well this is a little spooky how this conversation is becoming very like a recent experience of mine.

I had an averaging filter not unlike Mac's most recent reply. It worked well in a machine as tested in simulated tests however in real life its step response was too slow, like Rick just pointed out and mainly from zero. As the machine was already in production at my clients' client, I had to fix it quickly. After much studying of the real step response and knowing that I had it responding well to the normal steady state, I decided to just change it to full exact averaging. Basically I just kept all the samples, then threw away the oldest and added in the newest. This was enough to do the trick but only just. In a couple of months I will have a fresh machine again to re-visit this and improve it some more.

bigmac · ‎06-25-2007

Hello,

The edit time limit for my previous post expired -

MEAN = ((MEAN * 15) + SAMPLE) / 16

The following is my attempt to implement the above algorithm in assembly code, requiring 36 cycles for a HC08 -

LDA MEAN ; Previous 8-bit result

LDX #15

MUL

ADD ADRL ; 8-bit sample assumed

PSHA

TXA

ADC #0

PSHA

PULH

PULA

LDX #16

DIV

STA MEAN ; Updated value

Regards,

Mac

Rickrandom · ‎06-24-2007

Thanks for all the replies, very quick and helpful. I need to digest them all, which will take me a while! I expect I'll be back with more questions.....

peg · ‎06-24-2007

Hi Rick,

Here is some code to illustrate the basic concepts of what you want to do.

I included a 16-bit version as well to illustrate ADC.

Code:

*******************************************
; Average of 8 x 8-bit samples
*******************************************

CLR ACCUM1
CLR ACCUM2

;next section goes through 8 times with fresh sample

    LDA    SAMPLE
    ADD    ACCUM1
    STA    ACCUM1
    BCC    NOROLL
    INC    ACCUM2

NOROLL
NOP

;when you have 8 do the next bit to change the total in ACCUM2:ACCUM1 to the average

AVG
    LSR    ACCUM2
    ROR    ACCUM1
    LSR    ACCUM2
    ROR    ACCUM1
    LSR    ACCUM2
    ROR    ACCUM1

;now the average is in ACCUM1 (ACCUM2 must be 0)

*******************************************
; Average of 8 x 16-bit samples
*******************************************

    CLR    ACCUM1
    CLR    ACCUM2
    CLR    ACCUM3

;next section goes through 8 times with fresh sample

    LDA    SAMPLO
    ADD    ACCUM1
    STA    ACCUM1
    LDA    SAMPHI
    ADC    ACCUM2
    STA    ACCUM2
    BCC    NOROLL
    INC    ACCUM3

NOROLL
NOP

;when you have 8 do the next bit to change the total in ACCUM3:ACCUM2:ACCUM1 to the average

AVG
    LSR    ACCUM3
    ROR    ACCUM2
    ROR    ACCUM1
    LSR    ACCUM3
    ROR    ACCUM2
    ROR    ACCUM1
    LSR    ACCUM3
    ROR    ACCUM2
    ROR    ACCUM1

;now the average is in ACCUM2:ACCUM1 (ACCUM3 must be 0)

This simple technique only updates the average every 8 samples (using the last 8)

The next step up would be to create a rolling average which updates average every sample (based on the last 8)

Message Edited by peg on 2007-06-24 11:26 AM

rocco · ‎06-23-2007

Hi Rick,

So many questions, so few brain cells . . . Where to start . . .

I guess that I'm responding because I have recently done all of those things. They are all pretty easy. Let us tackle the math questions first.

Just because it is an 8-bit processor doesn't mean it can't do higher precision math efficiently. I do a lot of mixed-mode math, meaning combinations of 8-bit, 16-bit, 24-bit and 32 bit math. The HC08 instruction set contains some instructions specifically for multi-precision. Look at "Add-with-carry" (ADC) and "Subtract-with-carry" (SBC), and for shifting, look at the "Arithmetic-shift" (ASL, ASR) coupled with the "Rotate" (ROL, ROR) instructions. As an example, here is a 32-bit shift macro:

** Shift the 32 bit pseudo-accumulator left one position (multiply by 2).*ASL32:  MACRO         ASL    ACCUM0         ROL    ACCUM1         ROL    ACCUM2         ROL    ACCUM3        ENDM

With regards to your ADC samples, it sounds like what you want to do is a FIR (finite impulse response) filter. If you intend to take the most recent N samples, add them together and then divide by N (arriving at an average), that is a type of FIR filter that is often refered to as a "boxcar" filter. Though it may work well, it might give you problems with dithering, unless N is large. I can suggest better filters (less processing and less memory, with better results), specifically the "exponential smoothing forcast" filter (which takes only two bytes of memory, and processing requires ony one 8-bit subtract, some right shifts and a 16-bit add per sample). This is how I handle the ADC.

If dithering is a real concern, as it was for me in my application, I have an additional "detent" algorithm that caused the filter to favor the midpoint (.5) value of the current integer result. This has the effect preventing dithering if the analog value is near the border of two ADC integer values.

As for operating the ADC, I run it interupt driven, either at full-speed or "throttled" down. For full speed, I enable the ADC interupts, and start a new conversion as soon as the previous one is complete. This give me a sample every 17 microseconds on the GP32. In some applications, this is too fast. When I want a slower rate, I use the Time-Base Module (TBM) instead to provide the interupts. I then start a new conversion on each TBM interupt, giving me a precise sample rate.

All of the above is implimented in the attached file. There are equates at the front that define the number of channels to convert, the interupt method (ADC or TBM), the sample rate if using the TBM, and the ADC clock.

Message Edited by rocco on 2007-06-23 02:23 PM

bigmac · ‎06-24-2007

Hello Rick,

I think some additional comments are needed about the code posted by Rocco -

The sample code is written for an unspecified assembler, and would need some changes to be suitable for CW assembler. In particular PSCT and BSCT directives - perhaps use SECTION directive if your project uses relocatable assembly, or ORG directive if an absolute assembly project. Additionally, the QUE directive or macro (not sure which) is undefined.

The code assumes a 'GP32 device. The 'QY4 device does not contain a time base module, so the code would need to use a TIM channel (output compare mode, interrupt only) to achieve timed conversions.

If you use the 'QY4A device, you have the choice of 8-bit or 10-bit conversions. You will need to decide whether you need to average 10-bit or 8-bit readings. But perhaps you may want to initially stick with 8-bit conversions.

If my understanding of the sample code is correct, the "LOGfilter" routine effectively provides a rolling average, but averaged over 256 samples. The results from this routine are then slightly modified by "threshold" routine.

With the average of so many samples, the processed result will change very slowly, perhaps too slowly for your application. A reduction of the number of samples, should be possible, but will require significant alteration to the routine. It also appears that the "threshold" routine might be omitted if you are not trying to "squeeze" additional bits from the fractional part.

A further comment about the timing of the ADC samples -

If the reduction of the effect of low frequency (50/60 Hz) hum is of consideration with the requirement to average multiple samples, there should be at least two samples per cycle of the hum signal, preferably many more. Additionally, the averaging period of N samples should correspond to an integral number of hum cycles. The sample timing might be adjusted to suit this situation.

Regards,

Mac

rocco · ‎06-24-2007

Hi All,

bigmac wrote:
The sample code is written for an unspecified assembler, and would need some changes to be suitable for CW assembler. In particular PSCT and BSCT directives - perhaps use SECTION directive if your project uses relocatable assembly, or ORG directive if an absolute assembly project. Additionally, the QUE directive or macro (not sure which) is undefined.

Oops, yes. The assembler is MCUez, Motorola's predecessor to CodeWarrior. Mac is correct about the SECTION directive, PSCT and BSCT are just pre-canned program and zero-page sections. QUE is a macro, which just does a bit-set on a byte belonging to my task scheduler in order to run the named task.

If my understanding of the sample code is correct, the "LOGfilter" routine effectively provides a rolling average, but averaged over 256 samples. The results from this routine are then slightly modified by "threshold" routine.

With the average of so many samples, the processed result will change very slowly, perhaps too slowly for your application. A reduction of the number of samples, should be possible, but will require significant alteration to the routine. It also appears that the "threshold" routine might be omitted if you are not trying to "squeeze" additional bits from the fractional part.

Well, it appears to be over 256 samples, but in reality it is an exponential curve. Its behavior is identical to an RC filter, with a time constant of 256 samples. Yes, it may be slow, but the time constant is determined by the number of shifts, and I cheated by making the shift-count 8, requiring no shifts at all. Sorry. If you need a better example, I can dig-up an older version, which actually went through the trouble of doing the shifts.

The "Threshold" routine is the routine that prevents the dithering by biasing toward .5 in the fractional portion.

A further comment about the timing of the ADC samples -
If the reduction of the effect of low frequency (50/60 Hz) hum is of consideration with the requirement to average multiple samples, there should be at least two samples per cycle of the hum signal, preferably many more. Additionally, the averaging period of N samples should correspond to an integral number of hum cycles. The sample timing might be adjusted to suit this situation.

A very good, often neglected point. But with the above filter, it won't help much, because no two samples have the same weight. The weight of each sample decreases with time, and the AC hum will affect the most recent samples more than the older samples. Although it is actually an IIF filter, it behaves much like a FIR filter with a ramp (half-triangle) for a window function.

MC68HC908QY4 Assembly Help Please

MC68HC908QY4 Assembly Help Please

General