Understanding CodeWarrior Disassembly

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Understanding CodeWarrior Disassembly

Jump to solution
7,018 Views
Nevo
Contributor I
I am having some trouble understanding the disassembly of some of my code in CodeWarrior and am hoping some of y'all can help me out.
 
I have an interrupt that needs to occur every 65 us.  To make it as fast as possible, I'm writing it in assembly. A snippet of my code looks like:
 
Code:
    CPX m_ChannelDimValue:8;    BNE testCh9;    ORA #1;  testCh9:    CPX m_ChannelDimValue:9;    BNE testCh10;    ORA #2;  testCh10:    CPX m_ChannelDimValue:10;    BNE testCh11;    ORA #4;

 (Yes, I'm dimming lights.)  To determine how many clock cycles my routine will take, I right click on the code and select "Disassemble."  The output looks like this:
 
Code:
778:      CPX m_ChannelDimValue:8;  003f c30008   [4]             CPX   m_ChannelDimValue:8  779:      BNE testCh9;  0042 2602     [3]             BNE   L46 ;abs = 0046  780:      ORA #1;  0044 aa01     [2]             ORA   #1  0046          [5]     L46:      781:    testCh9:  782:      CPX m_ChannelDimValue:9;  0046 c30009   [4]             CPX   m_ChannelDimValue:9  783:      BNE testCh10;  0049 2602     [3]             BNE   L4D ;abs = 004d  784:      ORA #2;  004b aa02     [2]             ORA   #2  004d          [5]     L4D:      785:    testCh10:  786:      CPX m_ChannelDimValue:10;  004d c3000a   [4]             CPX   m_ChannelDimValue:10  787:      BNE testCh11;  0050 2602     [3]             BNE   L54 ;abs = 0054  788:      ORA #4;  0052 aa04     [2]             ORA   #4  0054          [5]     L54:   

 

First off, the formatting of this output is very difficult to read. Is there a way to clean up the formatting of this output?
 
Second, and more importantly, what are the nubmers in brackets? I think it's the number of clock cycles each instruction takes. But if that's the case, what is the [5] next to the labels?  It doesn't take 5 clock cycles to 'execute' a label... there's no instruction there to execute!  I'd really like to understand this output.
 
Thanks!
 
-Nevo
Labels (1)
Tags (1)
0 Kudos
Reply
1 Solution
2,838 Views
rocco
Senior Contributor II
Hi, Nevo:

All of those cycle counts are correct, except for the "[5]"s on the labels, which shouldn't be there at all. It is just another CW bug, and you can ignore them.

But there is another bug you should be concerned about. The assembler seems to be using the wrong version of the CPX instruction. The offsets 8, 9 and 10 are clearly less than 256, but the assembler is using the 16-bit offset instruction rather than an 8-bit offset instruction. It will work, but it's burning unnecessary cycles and memory.

View solution in original post

0 Kudos
Reply
15 Replies
2,839 Views
rocco
Senior Contributor II
Hi, Nevo:

All of those cycle counts are correct, except for the "[5]"s on the labels, which shouldn't be there at all. It is just another CW bug, and you can ignore them.

But there is another bug you should be concerned about. The assembler seems to be using the wrong version of the CPX instruction. The offsets 8, 9 and 10 are clearly less than 256, but the assembler is using the 16-bit offset instruction rather than an 8-bit offset instruction. It will work, but it's burning unnecessary cycles and memory.
0 Kudos
Reply
2,838 Views
Nevo
Contributor I
I won't even ask how you were able to spot that, rocco. I work with witches like you! :smileyhappy:
 
So is there any way for me to specify the correct opcode and get CodeWarrior to obey me? With this interrupt occuring every 65us (and that's just one of several interrupt routines), I'm eager to squeeze every last clock cycle out of this particular routine.
0 Kudos
Reply
2,838 Views
Nevo
Contributor I
Ah, I think I understand after looking at the family reference manual.
 
In order to ensure these instructions can use the 8-bit offset, I'd need to ensure that m_ChannelDimValue (a 32-element array of type byte) is entirely in the first 256 bytes of RAM, right?
 
The first 128 bytes of RAM are direct page registers, so I have 128 bytes there I can use.
 
I confess I'm new to CodeWarrior... is there any compiler directive I can use when I declare the variable to ensure it's in the first 256 bytes of RAM?
0 Kudos
Reply
2,838 Views
CompilerGuru
NXP Employee
NXP Employee
to locate a variable in the zero page use
#pragma DATA_SEG SHORT _DATA_ZEROPAGE
and make sure that _DATA_ZEROPAGE is really allocated explicitly in the firs 256 bytes (or you get a link time error).

The given code above does actually not state that m_ChannelDimValue was actually allocated in the zero page, instead this is a pre link time listing where m_ChannelDimValue does not have an address at all. Therefore without explicitly placing it into a SHORT section, the compiler has to use 16 bit offsets in the small memory model. The offsets 8,9,10 are just offsets, they are not the final link time address.
To configure the listing file, there is the -Lasmc compiler option which can be used to selectively disable some parts of the output.

Daniel
0 Kudos
Reply
2,838 Views
rocco
Senior Contributor II
But page zero has nothing to do with it . . . :smileyhappy:

In the HC08 Processor Reference Manual, the particular 8-bit offset version of the instruction that the compiler should be using is referred to as "IX1". It is described as "16-bit indexed with 8-bit offset". It can be used anywhere in the 16-bit address space.

Whenever you need a positive offset from the H:X register that is from 1 through 255, this instruction will work, regardless of where in the address space the H:X register points (which is never known at link time).
0 Kudos
Reply
2,838 Views
CompilerGuru
NXP Employee
NXP Employee
Hmm. not so sure Smiley Wink
CPX   m_ChannelDimValue:9

is not indexed at all, it is extended (EXT).
If m_ChannelDimValue would be in the zero page, the code could use direct (DIR) instead. IX2/IX1/IX do not work here, as for those, the code would have to load the address of the m_ChannelDimValue table into H:X first, but as the value to be compared is already in X, H:X is not available.
It's good possible that there are more efficient patterns to solve the complete task of the interrupt routine, but in the little snippet which was shown, the zero page would help a bit.

I guess what did confuse you is the format of the compiler generated listing file. The assembler would have printed some X'es in the code where fixups will affect it at link time, the compiler does not do this. Instead you see how the fixup addend is encoded in the code.

Daniel
0 Kudos
Reply
2,838 Views
rocco
Senior Contributor II
Hi, CG and Nevo

CompilerGuru wrote:
Hmm. not so sure Smiley Wink
CPX   m_ChannelDimValue:9

is not indexed at all, it is extended (EXT).


OOPS! My mistake! Boy, am I embarrassed.

I mistook the ":9" syntax to be equivalent to the "offset,X" syntax with the normal (non-C) assembler. Never mind . . .

CompilerGuru is correct (as always).
0 Kudos
Reply
2,838 Views
rocco
Senior Contributor II
Hi again, Nevo:

Here is a face-saving exercise, in my typical cycle-counting style.

I counted the cycles in your routine, and found that it takes 63 cycles most of the time, and an additional 2 cycles when a triac triggers.
        LDX     m_DimmerCount        ; 3
        CLRA                         ; 1
        CPX     m_ChannelDimValue:0  ; 4
        BNE     testCh1              ; 3
        ORA     #1                   ;     2
testCh1:
        CPX     m_ChannelDimValue:1  ; 4
        BNE     testCh2              ; 3
        ORA     #2                   ;     2
testCh2:
        CPX     m_ChannelDimValue:2  ; 4
        BNE     testCh3              ; 3
        ORA     #4                   ;     2
testCh3:
        CPX     m_ChannelDimValue:3  ; 4
        BNE     testCh4              ; 3
        ORA     #8                   ;     2
testCh4:
        CPX     m_ChannelDimValue:4  ; 4
        BNE     testCh5              ; 3
        ORA     #16                  ;     2
testCh5:
        CPX     m_ChannelDimValue:5  ; 4
        BNE     testCh6              ; 3
        ORA     #32                  ;     2
testCh6:
        CPX     m_ChannelDimValue:6  ; 4
        BNE     testCh7              ; 3
        ORA     #64                  ;     2
testCh7:
        CPX     m_ChannelDimValue:7  ; 4
        BNE     writePortA           ; 3
        ORA     #128                 ;     2
writePortA:
        STA     _PTAD                ; 3
;                                    ;--
;                                    =63 cycles + 2 cycles for each light that comes on

I have an uglier version, which does it in 38 cycles most of the time, but takes 7 additional cycles when the triac triggers.
;
; Index through the table, checking each value.
;
        LDA     m_DimmerCount       ; 3
        LDHX    m_ChannelDimValue   ; 3
;
        CBEQ    X+,on1              ; 4
do2:    CBEQ    X+,on2              ; 4
do3:    CBEQ    X+,on3              ; 4
do4:    CBEQ    X+,on4              ; 4
do5:    CBEQ    X+,on5              ; 4
do6:    CBEQ    X+,on6              ; 4
do7:    CBEQ    X+,on7              ; 4
do8:    CBEQ    X+,on8              ; 4
done:   RTN ?
;                                   ;--
;                                   =38 cycles + 7 cycles for each light that comes on
;
; Each one of these turns a light on.
;
on1:    BSET    0,_PTAD             ; 4
        BRA     do2                 ; 3
;
on2:    BSET    1,_PTAD             ; 4
        BRA     do3                 ; 3
;
on3:    BSET    2,_PTAD             ; 4
        BRA     do4                 ; 3
;
on4:    BSET    3,_PTAD             ; 4
        BRA     do5                 ; 3
;
on5:    BSET    4,_PTAD             ; 4
        BRA     do6                 ; 3
;
on6:    BSET    5,_PTAD             ; 4
        BRA     do7                 ; 3
;
on7:    BSET    6,_PTAD             ; 4
        BRA     do8                 ; 3
;
on8:    BSET    7,_PTAD             ; 4
        BRA     done                ; 3


This one does use indexed addressing, so the table can be anywhere in memory. This is what I mistakenly thought you were doing originally.
0 Kudos
Reply
2,838 Views
rocco
Senior Contributor II
Well, it is only fifteen minutes later, but too late to edit my post. So here is a correction:

The second code line:
        LDHX    m_ChannelDimValue   ; 3
should be:
        LDHX    #m_ChannelDimValue  ; 3
We want to load the ADDRESS of "m_ChannelDimValue" into the H:X register (immediate addressing), and not the 16-bit value at "m_ChannelDimValue" (direct addressing). The cycle count was correct, as I knew what I wanted, but I neglected the number-sign, as I often do in real life.
0 Kudos
Reply
2,838 Views
CompilerGuru
NXP Employee
NXP Employee
Note that the current "ugly" version does only set the PTAD bits, it does not clear them. So probably it should work on a direct allocated, initially cleared variable instead and move the result to PTAD at the end.

To make the ugly version a bit more complex, you could start with
0xFF in a direct location and then use BCLR to clear the bits of the channels which are not equal instead. This would avoid the need of the BRA's and therefore make the worst case quicker (but it makes the normal case slower...)

LDA m_DimmerCount
LDHX #m_ChannelDimValue
MOV #0xFF, directTemp
;
CBEQ X+,on1
BCLR 0,directTemp
on1: CBEQ X+,on2
....

Also if/how much the "ugly" version pays off also depends on if this is for a HC08. For a HCS08, the CBEQ and BCLR both are one cycle slower.
So the question which version to pick is if the application needs the best possible in the average case, if it needs the fastest in the worst case.
0 Kudos
Reply
2,838 Views
rocco
Senior Contributor II
Right again, CG. I guess that's how you became Guru. :smileywink:

I overlooked the clearing of the bits. When I did triac driving, I would set the bit, and leave it set until the end of the AC cycle. Different hardware, I guess.

I think that the fastest, simplest fix is to put this as the first line:
        CLR         _PTAD     ; 3
That would add 3 cycles, making it 41 cycles per iteration when no triacs are triggered.

Knowing that each triac only fires once per 8333 microseconds, my approach was to minimize cycles when the triacs DON'T fire, at the expense of cycles when they do. If there are 256 dim levels, meaning 256 iterations of the routine for each AC half-cycle, then the 7 cycles that it takes to trigger the triac averages to:

7 cycles times 8 triacs, divided by 256 iterations, or .22 cycles per iteration.

If there are 100 dim-levels, that is still only .56 cycles per iteration, on average.

Also, I did use cycles for the HC08. I'm not using the S08, so I didn't realize it was actually slower with some instructions.
0 Kudos
Reply
2,838 Views
Nevo
Contributor I
Thanks to both of you!  Like I said, I didn't expect anyone to write my code for me, but I find optimization fun and challenging and it appears others do, too. :smileyhappy:
 
Rocco's reasoning mirrors mine: the TRIACs will trigger only once per AC half-cycle, so optimizing for not triggering would minimize time spent in the interrupt even if the triggering case takes longer.
0 Kudos
Reply
2,838 Views
Nevo
Contributor I
(Newbie warning... I may have this all wrong...)
 
Rocco, I'm not so sure I could use IX1 addressing.
 
That addressing mode uses the H:X 16-bit register as the relative address.  But the value I want to compare against is in the X register.  The X register cannot simultaneously hold the offset to the memory location and the value I want to compare against.
 
Or am I overlooking something obvious?
0 Kudos
Reply
2,838 Views
Nevo
Contributor I
D'oh! Compiler Guru already said the same thing I said.
 
(On the other hand, it's a confidence-builder that I came to the same conclusion!)
0 Kudos
Reply
2,838 Views
Nevo
Contributor I
I certainly don't expect anyone else to write my code. But since optimization was brought up, I'll go ahead and throw out my code and if anyone would like to optimize it, I'd love to see what you can come up with.
 
The basics: The interrupt routine dims 8 lightson port A, synched to the mains AC. m_DimmerCount is incremented every interrupt, and the desired dim value (m_ChannelDimValue) for each channel is compared to the current DimmerCount. If they are equal, the port pin should go high to trigger a TRIAC. Otherwise the pin should be low. 
 
My hand-crafted asm code is:
 
Code:
    LDX m_DimmerCount;    CLRA;    CPX m_ChannelDimValue:0;    BNE testCh1;    ORA #1;  testCh1:    CPX m_ChannelDimValue:1;    BNE testCh2;    ORA #2;  testCh2:    CPX m_ChannelDimValue:2;    BNE testCh3;    ORA #4;  testCh3:    CPX m_ChannelDimValue:3;    BNE testCh4;    ORA #8;  testCh4:    CPX m_ChannelDimValue:4;    BNE testCh5;    ORA #16;  testCh5:    CPX m_ChannelDimValue:5;    BNE testCh6;    ORA #32;  testCh6:    CPX m_ChannelDimValue:6;    BNE testCh7;    ORA #64;  testCh7:    CPX m_ChannelDimValue:7;    BNE writePortA;    ORA #128;  writePortA:    STA _PTAD;

 I hope the code is easy enough to follow that I don't need to comment it.
0 Kudos
Reply