Implementing Guard Bits for EMACS Instruction

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

Implementing Guard Bits for EMACS Instruction

1,984 次查看
davekellogg
Contributor II

The EMACS instruction has no guard bits natively implemented, to accommodate overflow when accumulating the 32-bit result. 

 

Has anyone worked out the logic to implement a "manual" guard byte?

I'm interested in the fastest method possible, since my EMACS instruction is in the "hot spot" in a FIR filter loop.

 

It seems that the basic strategy is to check the overflow flag after the EMACS, and increment (or decrement) the guard byte.  The EMACS accumulator in memory also needs to be adjusted correspondingly.

标签 (1)
0 项奖励
回复
6 回复数

971 次查看
davekellogg
Contributor II

I should have mentioned that I'm using the S12X, and speed is a critical issue.  I considered using just a plain EMULS and doing the addition "by hand", but that seems to end up costing more cycles than just trying to build off the EMACS (which does 32-bits of the addition for only 8 cycle cost).

 

After thinking more about the accumulator overflow, I've come up with something like the code below.  When I carry or borrow into the high-order guard byte, I also adjust the EMACS accumulator by 0x8000:0000. 

 

1.  I believe that the same adjustment of 0x8000:0000 is needed for both overflow and underflow.  ie, I don't think I need to use ox7FFF:FFFF in the underflow case.

 

2. I believe that doing the bit set (or bit clear) is the same as subtracting 0x8000:0000 from the accumulator in memory, but much faster.  Does anyone see a problem with this?

 

3. Does anyone see a better (faster) way to do the branching?  It seems that I need to separately determine if overflow occured, and then in which direction.

 

Dave Kellogg

 

Loop: 
    ???                     ; Adjust IX for next multiplier
    ???                     ; Adjust IY for next multiplicand
    EMACS   accum           ; Sets the overflow flag.
    BVC     Loop            ; [3/1] Branch if no overflow

    ; There was an overflow.  Determine direction of overflow.
    BMI     IsNeg           ; [3/1] Branch if overflowed in negative direction

    ; Overflow direction was positive.
    INC     guard           ; [4] Overflowed in the positive direction.
    BCLR    accum,0x80      ; [4] Clearing the MSB is same as subtracting 0x8000:0000 from accum, but faster.
    BRA     Loop            ; [3]

    : Overflow direction was negative (ie, underflow)
IsNeg:
    DEC     guard           ; [4] Underflowed in the negative direction.
    BSET    accum,0x80      ; [4] Setting the MSB is the same as adding 0x8000:0000 to accum, but faster.
    BRA     Loop            ; [3]

    Worst case time after EMACS instruction for guard adjust is [15] cycles.

0 项奖励
回复

971 次查看
kef
Specialist I

CompilerGuru ,

 

yes, adjusting accumulator on overflows is absolutely necessary. When adding +ve to +ve overflows to -ve, acc should be adjusted and kept +ve, else next time you add +ve to -ve(which should be +ve but is -ve due previous overlow), overflow wan't happen when it has to happen and guard count will be lost.

 

davekellogg ,

 

Interesing idea and looks like it is working on paper.

 

1. Yes, adding 0x8000:0 should work in both cases. -0x60...-0x20...=-0x80... is not an overflow case. And 0x60+0x20=-0x80 is an overflow case.

 

2. Yes, bset/bclr #0x80 can replace adding 0x8000:0. But I see them reversed in your code. In IsNeg case you should clear MSB bit (because it is already set) and in +ve case you should set it.

 

 

On S12X EMULS code is bit faster. For 48bits accumulator, X-reg not touched:

    LDD    data

    LDY    coef

    EMULS

 

    ADDD  acclo 

    STD    acclo

 

    TFR    Y,D    ; 16->32 bit SEX source is D reg only

 

    ADEY  accmid

    STY    accmid

   

    SEX    D,Y   ; sign extend product

 

    ADEY  acchi

    STY    acchi

Message Edited by kef on 2009-11-20 11:12 AM
0 项奖励
回复

971 次查看
CompilerGuru
NXP Employee
NXP Employee

Hi Kef,

 

> yes, adjusting accumulator on overflows is absolutely necessary. 

> When adding +ve to +ve overflows to -ve, acc should be adjusted and kept +ve,

> else next time you add +ve to -ve(which should be +ve but is -ve due previous overlow),

> overflow wan't happen when it has to happen and guard count will be lost.

 

I cannot follow, what is +ve? Some huge value which overflows when added?

I consider that guard counts the number of overflows outside of the 32 bit signed range. It gets incremented when the addition results in a value >= 2^31, it gets decremented when the result is < -2^31. When the 32 bit accumulator crosses 0 from -ve to +ve no overflow takes place, seems expected and ok to me.

 

The guard count is not just some additional bits for the sum, instead the value is

val = sval32 + 2^32*guard

with sval32 the signed accumulator and guard the overflow count (incremented for overflow, decremented for underflow, initially 0).

For val = 40000, sval32 would be  -25536 and guard == 1. The guard is not the same as bits 32..39 of the summand, it is off by one if sval32 < 0.

So appart from having to treat guard properly in the end when comparing the final result with a 40 bit value, I don't see why there adjustment is necessary.

 

Still wondering if I miss something here.

0 项奖励
回复

971 次查看
kef
Specialist I

CompilerGuru,

 

thank you very much, I understand signed overflow features better now.

 

Sorry for +ve and -ve. It's positive (+ve) and negative (-ve). I saw it in some (non Freescale probably) datasheet and thought everyone is familiar with this.

Since oVerflow bit is set in two cases: a) in case when adding positive number to positive number gives negative result (+ve + +ve = -ve); b) in case when adding negative number to negative number gives positive result (-ve + -ve = +ve). I was concerned that it may be the case when V bit will be falsely triggered or falsely not triggered, because on overflow sign of accumulator is sort of lost. In fact it is way simplier. The sign of result isn't lost and is encoded in two places: in the sign of accumulator and in the guard count. That's why I used to thing that sign of accumulator should be preserved by restoring it by adding or subtracting 0x80000000 on each overflow.

 

Summarizig, please correct me if I'm wrong:

 

1. On overflow, in case

    result is negative: increment guard count because addend was positive

    result is positive:  decrement guard count because addend was negative

 

2. To convert accumulator and guard count to more bits number, first we need to sign extend accumulator, then add guard count to higher order bits

 

 

For example for signed octets: 0x70 + 0x70 = 0xE0, V = 1, N = 1, Guard+1 = 1

16bits result: 1) sign extending 0xE0 to 16bits gives negative 0xFFE0. 2) 0xFFE0 + 2^8*Guard = 0x00E0

 

Thank You

0 项奖励
回复

971 次查看
CompilerGuru
NXP Employee
NXP Employee

Is the adjustment of the accumulator for the overflow case necessary? Or is it not sufficient to just increment/decrement the guard (and considering the guard to be the bits 32..39 of the value)

and to consider the guard as well when reading the final answer?

 

You could also store the guard value in the A or B registers if those are free.

 

Then the loop can be rotated so it starts with IsNeg. The advantage is that both "worst cases" now need the same number of branches and the worst case is therefore a bit better.

 

Daniel

 

    CLRA IsNeg:    DECA     ;guard         ; [1] Underflowed in the negative direction.Loop    EMACS   accum           ; [] Sets the overflow flag.    BVC     Loop            ; [3/1] Branch if no overflow    ; There was an overflow.  Determine direction of overflow.    BMI     IsNeg           ; [3/1] Branch if overflowed in negative direction    ; Overflow direction was positive.    INCA    ;guard          ; [1] Overflowed in the positive direction.    BRA     Loop            ; [3]

 

 

 

0 项奖励
回复

971 次查看
kef
Specialist I

It is not as simple as to inc/decrement high bits on signed oVerflow. After accumulator overflows; next time you EMACS, accumulator still can be + or - overflowed, but V bit won't be set. For example with overflow at +-100, you add 80 + 80 and get V=1 and -40. Next time you add -40 + 80 and get V=0 and +40, but you still have to increment hi bits...

 

I think it would be simplier to just use EMULS instruction, sign extend product, and add it to accumulator. For 40bits accumulator and non-S12X parts it should be something like this:

 

 

    LDD    data

    LDY    coef

    EMULS

   

    EXG    D,Y

    SEX    A,X   ; sign extend product to X

    EXG    D,Y

 

    ADDD  acclo

    STD    acclo

    TFR    Y,D

    ADCB  accmid:1

    ADCA  accmid 

    STD    accmid

    TFR    X,D

    ADCA  acchi

    STAA  acchi

   

0 项奖励
回复