Abysmally-slow IO toggling

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Abysmally-slow IO toggling

Jump to solution
2,422 Views
dave408
Senior Contributor II

I'm sure I'm doing something wrong here, I just don't know what it is.

I have a test KSDK + PEx project for a FRDM-K22F and I am just tight looping, toggling a single GPIO.  I have set my clock configuration to 1 (maximum) and yet my output frequency is just under 780kHz.  Is there something I'm missing here?

My test app is literally Create New -> FRDM-K22F -> KSDK + PEx -> add fsl_gpio -> bitbang in while loop.

With default "max speed" clock settings:

FRDM-K22F: 780kHz

FRDM-K64F: 760kHz

  • In this particular case, I configured UART1 TX on PTC4 and output characters with UART_DRV_SendData().  My best guess from the smallest pulse width when sending 0x55 is 3.68MHz.  Baud rate was set to max value of 7.5MHz

under mbed (just as a quick comparison)

FRDM-K64F: 640kHz

1 Solution
1,902 Views
egoodii
Senior Contributor III

I am not intimately familiar with the K22F clock-settings; the closest I come is K20, and I do everything 'bare metal' to be in complete control.  So I can't give any particular 'hints' as to your supposed factor-of-2, which I think we can trust from your numbers!

So this guy:

Confirming K22F Clock Frequency

was making a bit-banged SPI, and I showed where an IAR-chain high-optimization loop could net 16 half-word instructions per bit in&out.  If your 'inner loop' can look anything like this, you should see it cycle about 20 clocks per bit (or better!).  As discussed at the end of the thread, a 'dirt simple' sequence of PORTx_PTOR 'toggles' can certainly toggle at full core-clock speed.

View solution in original post

0 Kudos
Reply
15 Replies
1,902 Views
egoodii
Senior Contributor III

Same story we seem to go thru here:  Firstly, concoct a means to confirm that your system/bus clock is exactly as expected -- UART0 bit rate is a good way.  Then, you have to look at the complete assembly-instruction-level sequence of your 'while loop'..

1,902 Views
dave408
Senior Contributor II

Oh, one more thing -- can you give me an idea of what settings you had changed in PEx (or via the KSDK) to get the correct clock configuration?

0 Kudos
Reply
1,902 Views
egoodii
Senior Contributor III

Unfortunately, I don't use either of those tools, so I can't be any direct help in that fashion!

We don't care WHAT the UART clock rate IS, just that we know the exact source AND divisor.  UART0 (and 1) should have the particular advantage of being derived directly from the Bus Clock, so there is 'one less unknown' in the path.

0 Kudos
Reply
1,902 Views
dave408
Senior Contributor II

Hi egoodii I still need to read up on the clock configuration in the K22F and K64F to get a better understanding of things.  But yesterday, my test of setting UART1 to max baud rate and then sending 0x55 in a tight loop yielded a bitrate of 3.68MHz, which is about half of the expected speed.  I suppose one way of looking at this is that it's at least not off by a couple of orders of magnitude.  :smileyhappy:

I did look over the assembly code, and the KSDK methods for setting and clearing bits cost about 36 instructions each!  120MHz / 36 * 2 means I should see a toggling frequency of about 1.67MHz.  Again, my actual frequency of about 760kHz is roughly half of the expected, and at least this would explain the huge difference in actual speed vs. the expected.  Thanks again for suggesting that I dig into the assembly -- it would have taken a bit more convincing on my own before I would have reached that conclusion.

With this information, would you guess that there is an incorrect prescaler setting somewhere?  In addition, I take it you wrote an optimized function for doing GPIO operations?  Do you think my process outlined here is valid?

0 Kudos
Reply
1,903 Views
egoodii
Senior Contributor III

I am not intimately familiar with the K22F clock-settings; the closest I come is K20, and I do everything 'bare metal' to be in complete control.  So I can't give any particular 'hints' as to your supposed factor-of-2, which I think we can trust from your numbers!

So this guy:

Confirming K22F Clock Frequency

was making a bit-banged SPI, and I showed where an IAR-chain high-optimization loop could net 16 half-word instructions per bit in&out.  If your 'inner loop' can look anything like this, you should see it cycle about 20 clocks per bit (or better!).  As discussed at the end of the thread, a 'dirt simple' sequence of PORTx_PTOR 'toggles' can certainly toggle at full core-clock speed.

0 Kudos
Reply
1,902 Views
dave408
Senior Contributor II

Thanks, Earl.  I feel the same way about everything you have said here.  I'll likely eventually go the "dirt simple" route and for now I think it makes sense to mark yours as the correct answer!

0 Kudos
Reply
1,902 Views
xiangjun_rong
NXP TechSupport
NXP TechSupport

Hi, Dave,

I do not know if your question has been answered. let's go back to your original question why the GPIO toggling is slower than expected. As you know that the GPIO toggling is not finished in only one assembly instruction, that is the root cause.

for example, the GPIO toggling instruction:

GPIOC_PTOR|=0x200;

    asm("nop");

after the compiling, it is compiled as the following instructions:

GPIOC_PTOR|=0x200;
   0x1fff0e74: 0x4836     LDR.N R0, [PC, #0xd8]     ; [0x1fff0f50] 0x400ff08c (1074786444)
   0x1fff0e76: 0x6800     LDR   R0, [R0]
   0x1fff0e78: 0xf450 0x7000  ORRS.WR0, R0, #512        ; 0x200
   0x1fff0e7c: 0x4934     LDR.N R1, [PC, #0xd0]     ; [0x1fff0f50] 0x400ff08c (1074786444)
   0x1fff0e7e: 0x6008     STR   R0, [R1]
asm("nop");

That is why the GPIO toggling is slower than your expected.

Hope it can help you.

BR

XiangJun Rong

1,902 Views
egoodii
Senior Contributor III

THAT detail is 'fundamentally flawed' by the |= operation!  These direct-output-control "registers" are intentionally WRITE ONLY -- in that a 'read' ALWAYS returns '0', so as you can imagine an 'or' operation is particularly pointless, and will MASSIVELY slow things down waiting for the read-op to process (not to mention, that code is completely unoptimized -- even reloads the GPIO register address for the second-half).  I am truly surprised by how many Freescale examples make this error.

As you can see in that aforementioned thread, Mark and I prove that when the port-address is in a register (as they should be in a fully optimized loop!), each 'write' op completes in ONE core clock.

1,902 Views
dave408
Senior Contributor II

Thanks, Earl.  I will follow up on this information!

0 Kudos
Reply
1,902 Views
egoodii
Senior Contributor III

So, did you ever get anything like 4 to 5Mb/s 'bit banged SPI' operation?

0 Kudos
Reply
1,902 Views
dave408
Senior Contributor II

I've got so many fish to fry that I haven't even reached the optimization phase of my project.  :smileyhappy:  I'll definitely update this post with my results when I get around to it!  Thanks for checking in.

0 Kudos
Reply
1,902 Views
dave408
Senior Contributor II

Thanks, XiangJun.  I haven't yet gotten around to converting my bitbang SPI library to not use the KSDK functions, but the info you have provided is very useful as well.  Thanks!

0 Kudos
Reply
1,902 Views
egoodii
Senior Contributor III

You should be able to use something 'very close' to the bit-banged SPI I compacted in that other thread.

1,902 Views
dave408
Senior Contributor II

Ok, I understand.  I'll just reassign pins as/if necessary!  Thanks!

0 Kudos
Reply
1,902 Views
dave408
Senior Contributor II

Thanks for the info and suggestion, Earl!  So if I understand correctly, configure UART0 (or maybe UART2 since that's clearly available on the FRDM-K22F) for max baud rate at that clock configuration setting, and then have it output a bit pattern and check the frequency with the scope?

0 Kudos
Reply