MCF52233 Bit banging

rlstraney · ‎03-26-2009

I have written the following code and captured the output on a logic analyzer.
Basically I am just sending 32 1s and then the value in "data1" bit by bit.

It works, but slower than I hoped. Look at the line with the comment "// rising edge of clock". If I capture that rising edge of the clock on my logic analyzer and measure the time between concurrent rising edges I get

1170 ns.

The 32 1s have a period of 460ns. I would love to speed that up too if possible.

If found that
if(+( (data1 << j) & 0x80000000))

runs faster than
bit = ( (data1 << j) & 0x80000000);
if (bit == 0x80000000)

The latter had a 1230 ns period

1170 ns is still very slow considering that the receiving device can handle a period of 80ns.

I don't expect to hit 80ns but I do hope to beat 1170 ns.

Does anyone have any suggestion to speed up this parallel to serial bit banging?
Is there any sample assembly for the MCF5223X that does this?

void phy_write(unsigned int phy, unsigned int reg, unsigned int data)
{
unsigned   int i,data1,data2,j,bit;

   data1 = 0x50000000;
   data1 = data1 + (phy << 23);
   data1 = data1 + (reg << 18);
   data1 = data1 + 0x00020000;
   data1 = data1 + data;
// send 32 1s
   for(i=0;i<32;i=i+1)
   {
       MCF_GPIO_PORTTC = 0x02;
          MCF_GPIO_PORTTC = 0x03;
   }
// send data
    for(j=0;j<=31;j=j+1)
    {
       //bit = ( (data1 << j) & 0x80000000);
      //if (bit == 0x80000000)
        if(+( (data1 << j) & 0x80000000))
        {
           MCF_GPIO_PORTTC = 0x02;
        }
        else
        {
            MCF_GPIO_PORTTC = 0x00;
        }
        MCF_GPIO_PORTTC = MCF_GPIO_PORTTC + 0x01; // rising edge of clock
          MCF_GPIO_PORTTC = MCF_GPIO_PORTTC & 0x02; // falling edge of clock
    }
}

rlstraney · ‎03-27-2009

I am trying to verify the frequency of the system clock. MFD is set to 4 and RFD is set to 0. Using the equation in the data sheet that implies that the Fsys is Fref * 12. My Xtal is 25 Mhz but there is a PLL Predivider in figure 7-1 of MCF52235 that I cannot figure out what it is set to.

For some reason I thought it was set to divide by 5, but was expecting a 60 Mhz system clock and therefore may have assumed it was set to divide by 5.

Maybe it is right in front of me but I don't see where they explicitly say what this divider is set to.

Does anyone have any idea?

Just FYI: I am using the M52233demo board and the xtal appears to be 25 Mhz and it has the MCF52233 on it.

Thank you

Renee

RichTestardi · ‎03-30-2009

This is how I set my 52233 to run at 60MHz with a 25MHz crystal:

    // we use the 25MHz crystal divided by 5    MCF_CLOCK_CCHR = 4;    // and multiply by 12 to get 60MHz    MCF_CLOCK_SYNCR = MCF_CLOCK_SYNCR_MFD(4)|MCF_CLOCK_SYNCR_CLKSRC|MCF_CLOCK_SYNCR_PLLMODE|MCF_CLOCK_SYNCR_PLLEN;        // no USB    cpu_frequency = 60000000;    bus_frequency = cpu_frequency/2;    oscillator_frequency = 25000000;

I believe CCHR is your pre-divider in that picture.

-- Rich

rlstraney · ‎03-26-2009

I am writing to the management interface of an ethernet phy (MDIO and MDC - data and clock respectively).

This is my first time working with Freescale. I just wanted to see how fast I could get this to work. Didn't know if I should change the board before it is built and have the Phy management bus go to the FPGA instead of the MCF52233. I know I can get the FPGA to run at this speed but it is so much easier doing this in the software because it is quicker to change.

I couldn't right click and get dissasembly but I could go to data-view mixed and get the following:

{
20000900: 4E560000        link     a6,#0
20000904: 2F07            move.l   d7,-(a7)
20000906: 2F06            move.l   d6,-(a7)
20000908: 2C2E0008        move.l   8(a6),d6
2000090C: 2E2E000C        move.l   12(a6),d7
20000910: 242E0010        move.l   16(a6),d2
unsigned    int i,data1,j,bit;

    data1 = 0x50000000;
20000914: 203C50000000    move.l   #1342177280,d0
    data1 = data1 + (phy << 23);
2000091A: 7217            moveq    #23,d1
2000091C: E3AE            lsl.l    d1,d6
2000091E: D086            add.l    d6,d0
    data1 = data1 + (reg << 18);
20000920: 7212            moveq    #18,d1
20000922: E3AF            lsl.l    d1,d7
20000924: D087            add.l    d7,d0
    data1 = data1 + 0x00020000;
20000926: 068000020000    addi.l   #131072,d0
    data1 = data1 + data;
2000092C: D082            add.l    d2,d0
// send 32 1s
    for(i=0;i<32;i=i+1)
2000092E: 7E00            moveq    #0,d7
    {
        MCF_GPIO_PORTTC = 0x02;
20000930: 7402            moveq    #2,d2
           MCF_GPIO_PORTTC = 0x03;
20000932: 7203            moveq    #3,d1
20000934: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
2000093A: 5887            addq.l   #4,d7
2000093C: 13C14010000F    move.b   d1,0x4010000F (0x4010000f)
20000942: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
20000948: 13C14010000F    move.b   d1,0x4010000F (0x4010000f)
2000094E: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
20000954: 13C14010000F    move.b   d1,0x4010000F (0x4010000f)
2000095A: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
20000960: 13C14010000F    move.b   d1,0x4010000F (0x4010000f)
20000966: 0C8700000020    cmpi.l   #32,d7
2000096C: 65C6            bcs.s    phy_write+0x34 (0x20000934); 0x20000934
    }
// send data
bit = data1;
    for(j=0;j<=31;j=j+1)
2000096E: 7E00            moveq    #0,d7
    {

       // if(+( (data1 << j) & 0x80000000))        // try 1
        if(bit&0x80000000)                     // try 2
        {
            MCF_GPIO_PORTTC = 0x02;
20000970: 7402            moveq    #2,d2
        }
        else
        {
               MCF_GPIO_PORTTC = 0x00;
        }
        bit <<=1;                               // try 2
         MCF_GPIO_SETTC = 0x01; // rising edge of clock
20000972: 7201            moveq    #1,d1
20000974: 0800001F        btst     #31,d0
20000978: 6708            beq.s    phy_write+0x82 (0x20000982); 0x20000982
2000097A: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
20000980: 6006            bra.s    phy_write+0x88 (0x20000988); 0x20000988
20000982: 42394010000F    clr.b    0x4010000F (0x4010000f)
20000988: E388            lsl.l    #1,d0
2000098A: 13C14010003F    move.b   d1,0x4010003F (0x4010003f)
    }
20000990: 5287            addq.l   #1,d7
20000992: 0C870000001F    cmpi.l   #31,d7
20000998: 63DA            bls.s    phy_write+0x74 (0x20000974); 0x20000974
}
2000099A: 2C1F            move.l   (a7)+,d6
2000099C: 2E1F            move.l   (a7)+,d7
2000099E: 4E5E            unlk     a6
200009A0: 4E75            rts
200009A2: 51FC            trapf

RichTestardi · ‎03-26-2009

Oh, never mind on the SPI comment, then -- you've got bidirectional signals...

It seems you might be able to do a bit better if you could get the compiler to put your register addresses into registers...

Maybe something like:

register vuint8 *settc = &MCF_GPIO_SETTC;

register vuint8 *porttc = &MCF_GPIO_PORTTC;

And then manipulate *settc and *porttc instead of the original macros... But that's just a shot in the dark.

I'll ask the obvious question... Is your clock running at full speed?

You might be able to go a bit faster if you copied that routine to RAM to run it.

rlstraney · ‎03-26-2009

I have Optimize for "faster execution speed"

set compiler to Level 4

All three of these are checked:

Register coloring

Instruction scheduling

peephole

A6 Stack Frames is checked.

(not sure what all of the above are for - I will look them up)

I single stepped into the phy_write function, here is the assembly:

20000900: 4E560000        link     a6,#0
20000904: 2F07            move.l   d7,-(a7)
20000906: 2F06            move.l   d6,-(a7)
20000908: 2C2E0008        move.l   8(a6),d6
2000090C: 2E2E000C        move.l   12(a6),d7
20000910: 242E0010        move.l   16(a6),d2
20000914: 203C50000000    move.l   #1342177280,d0
2000091A: 7217            moveq    #23,d1
2000091C: E3AE            lsl.l    d1,d6
2000091E: D086            add.l    d6,d0
20000920: 7212            moveq    #18,d1
20000922: E3AF            lsl.l    d1,d7
20000924: D087            add.l    d7,d0
20000926: 068000020000    addi.l   #131072,d0
2000092C: D082            add.l    d2,d0
2000092E: 7E00            moveq    #0,d7
20000930: 7402            moveq    #2,d2
20000932: 7203            moveq    #3,d1
20000934: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
2000093A: 5887            addq.l   #4,d7
2000093C: 13C14010000F    move.b   d1,0x4010000F (0x4010000f)
20000942: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
20000948: 13C14010000F    move.b   d1,0x4010000F (0x4010000f)
2000094E: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
20000954: 13C14010000F    move.b   d1,0x4010000F (0x4010000f)
2000095A: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
20000960: 13C14010000F    move.b   d1,0x4010000F (0x4010000f)
20000966: 0C8700000020    cmpi.l   #32,d7
2000096C: 65C6            bcs.s    phy_write+0x34 (0x20000934); 0x20000934
2000096E: 7E00            moveq    #0,d7
20000970: 7402            moveq    #2,d2
20000972: 7201            moveq    #1,d1
20000974: 0800001F        btst     #31,d0
20000978: 6708            beq.s    phy_write+0x82 (0x20000982); 0x20000982
2000097A: 13C24010000F    move.b   d2,0x4010000F (0x4010000f)
20000980: 6006            bra.s    phy_write+0x88 (0x20000988); 0x20000988
20000982: 42394010000F    clr.b    0x4010000F (0x4010000f)
20000988: E388            lsl.l    #1,d0
2000098A: 13C14010003F    move.b   d1,0x4010003F (0x4010003f)
20000990: 5287            addq.l   #1,d7
20000992: 0C870000001F    cmpi.l   #31,d7
20000998: 63DA            bls.s    phy_write+0x74 (0x20000974); 0x20000974
2000099A: 2C1F            move.l   (a7)+,d6
2000099C: 2E1F            move.l   (a7)+,d7
2000099E: 4E5E            unlk     a6
200009A0: 4E75            rts
200009A2: 51FC            trapf

Thanks

Renee

RichTestardi · ‎03-26-2009

Can you right-click on your .c file and select Disassemble?

Then search for your function in the resulting output.

That will give you C intermixed with assembly, so you can see why the compiler is doing what it is doing.

Actually, the code does not look too bad at all -- one of your loops got unrolled and the other is from X974 to X998.

You might be able to eek out another factor of 2 here, but not much more I believe.

Maybe we should step back and see what you are trying to do? What kind of device are you trying to talk to? If you're generating clock and data, you might have better luck using the built in SPI peripheral.

RichTestardi · ‎03-26-2009

I'm likely missing the forest for the trees here, but a simple thing you can do is replace these lines:

MCF_GPIO_PORTTC = MCF_GPIO_PORTTC + 0x01; // rising edge of clock
MCF_GPIO_PORTTC = MCF_GPIO_PORTTC & 0x02; // falling edge of clock

With:

MCF_GPIO_SETTC = 0x01; // rising edge of clock
MCF_GPIO_CLRTC = ~0x02; // falling edge of clock

That should cut your expensive register accesses, if I followed what you are doing correctly.

The next thing I'd do is look at the assembly code -- you are shifting data1 over and over again for each loop iteration... How about shifting it in place and looking at the high bit each time, like:

bits = data1;

for (i = 0; i < 32; i++) {

if (bits & 0x80000000) {

...

} else {

...

}

...

bits <<= 1;

}

Hopefully the compiler will take advantage of the sign bit for the "&" operation.

-- Rich

rlstraney · ‎03-26-2009

Thank you for the suggestion. By changing the clock as you suggested I had a little over 700 ns period (about 470ns reduction!). Then I reduced it even further by taking the clock low when I clock data out and now my period is about 538 ns.

I believe you were implying that by shifting the variable "bit" by _one_ each iteration instead of my variable "byte1" by whatever _j_ is (which could be between 0 and 31) the code should be faster? It didn't turn out that way. In fact it didn't make any difference at all on the clock period.

Below I have commented out the line that says "try1" and added the lines that say "try 2" - resulting in what I believe you were suggesting. Both of those "trys" yield a 536 ns clock.

It is interesting to note that by changing the compiler optimization from off to level 4 I have a 500 ns period, saving about 38 ns. (I tried this setting with both try1 and try2 code)

void phy_write(unsigned int phy, unsigned int reg, unsigned int data)
{
unsigned   int i,data1,j,bit;

   data1 = 0x50000000;
   data1 = data1 + (phy << 23);
   data1 = data1 + (reg << 18);
   data1 = data1 + 0x00020000;
   data1 = data1 + data;
// send 32 1s
   for(i=0;i<32;i=i+1)
   {
        MCF_GPIO_PORTTC = 0x02;
           MCF_GPIO_PORTTC = 0x03;
   }
// send data
bit = data1;
    for(j=0;j<=31;j=j+1)
    {

       // if(+( (data1 << j) & 0x80000000))       // try 1
        if(bit&0x80000000)                     // try 2
        {
            MCF_GPIO_PORTTC = 0x02;
        }
        else
        {
               MCF_GPIO_PORTTC = 0x00;
        }
        bit <<=1;                             // try 2
         MCF_GPIO_SETTC = 0x01; // rising edge of clock
    }
}

RichTestardi · ‎03-26-2009

Hi,

A multi-bit shift should be the same speed as a single-bit, if a barrel shifter is in use.

The place it should help is you're maintaining less state when you shift by a constant than by a register.

Have you turned up your optimizations? In particular, you want Level 4, optimize for speed, and register coloring, peephole optimizations, and instruction scheduling all turned on. In general, you can not keep a6 stack frames, if you want max speed and minimum code size, but that won't help you in a leaf function.

Can you post the disassembly? Right click -> Disassemble.

The compiler should be using the sign bit.

-- Rich

MCF52233 Bit banging

MCF52233 Bit banging

General