Confirming K22F Clock Frequency

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Confirming K22F Clock Frequency

2,872 Views
tharonhall
Contributor IV

Does anyone know a surefire way to determine with absolute certainty that a K22F is, in fact, clocking at 120 MHz? We are seeing some curious timing and one possible explanation could be that the system is running much slower than it should and want to be certain we are at the correct clock speed.

Thanks!

Labels (1)
19 Replies

1,829 Views
mjbcswitzerland
Specialist V

Tharon

To verify the system clock you can configure UART0 (or 1) and check that its Baud is as expected.

This is because these UARTs are clocked directly by the system clock and so the UART speed give an indirect confirmation of it.

Regards

Mark

Kinetis: µTasker Kinetis support

K22: µTasker Kinetis FRDM-K22F support / µTasker Kinetis TWR-K22F120M support

For the complete "out-of-the-box" Kinetis experience and faster time to market

1,829 Views
tharonhall
Contributor IV

For what it's worth, I looked through the MCG and SIM registers with a fine tooth comb and everything looks perfect with the exception that the FlexBus clock frequency is too high, but since we don't have an external bus I don't think that matters anyway.

0 Kudos

1,829 Views
mjbcswitzerland
Specialist V

Hi

Try toggling a pin with

GPIOx_PTOR = pin_bit; // eg. GPIOA_PTOR = 0x00000008 for PTA3

GPIOx_PTOR = pin_bit;

GPIOx_PTOR = pin_bit;

GPIOx_PTOR = pin_bit;

GPIOx_PTOR = pin_bit;

GPIOx_PTOR = pin_bit;

GPIOx_PTOR = pin_bit;

GPIOx_PTOR = pin_bit;

GPIOx_PTOR = pin_bit;

- you should get 50..60MHz generated.

If you do, the internals are running at the expected speed and you will need to study the code generated by the compiler as suggested by Earl (make sure that you have optimisation enabled of course).

Regards

Mark

Kinetis: µTasker Kinetis support

K22: µTasker Kinetis FRDM-K22F support / µTasker Kinetis TWR-K22F120M support

For the complete "out-of-the-box" Kinetis experience and faster time to market


0 Kudos

1,829 Views
egoodii
Senior Contributor III

Nothing is quite 'that straightforward' in a complex processor with pre-fetch, arbiters, pipelines, write-buffers, and the like.  I put in a list of 5, and the 'whole following group' of 5 32-bit instruction-words runs 125ns (12 clocks at 96MHz, so 2 cycles each plus 2 'delay'), BUT not in a 'regular' way!

PTOR.bmp

        GPIOC_PTOR = 1<<16;

       0x64dc: 0xf8df 0x0550  LDR.W     R0, ??DataTable18_17    ; GPIOC_PTOR

       0x64e0: 0xf8c0 0xb000  STR.W     R11, [R0]

        GPIOC_PTOR = 1<<16;

       0x64e4: 0xf8c0 0xb000  STR.W     R11, [R0]

        GPIOC_PTOR = 1<<16;

       0x64e8: 0xf8c0 0xb000  STR.W     R11, [R0]

        GPIOC_PTOR = 1<<16;

       0x64ec: 0xf8c0 0xb000  STR.W     R11, [R0]

        GPIOC_PTOR = 1<<16;

       0x64f0: 0xf8c0 0xb000  STR.W     R11, [R0]

0 Kudos

1,829 Views
mjbcswitzerland
Specialist V

Earl

See also the following: Re: Fast GPIO on Kinetis KF22

I tested once with 100 toggles and then it was very regular.

The idea was however to get another 'confirmation' of the general ball-park to ensure that there is no large unexpected deviation. With caching and such it is not necessarily the way to generate a 50/50 square wave.

Regards

Mark

Kinetis: µTasker Kinetis support

K22: µTasker Kinetis FRDM-K22F support / µTasker Kinetis TWR-K22F120M support

For the complete "out-of-the-box" Kinetis experience and faster time to market

0 Kudos

1,829 Views
egoodii
Senior Contributor III

Agreed -- when I get the compiler to create single-instruction writes, I get a 48MHz square wave.

So the outstanding question is whether this user can get the GCC tools to put out a software-SPI loop anything like the one I get from IAR, and if THAT runs at 'multi megahertz' speeds?

0 Kudos

1,829 Views
dave408
Senior Contributor II

Earl, thanks for sharing your solution for bitbanged SPI.  I'll be revisiting that subject once I get through some other tasks.  I'll post up my results for the K64 and K22 then in my thread Abysmally-slow IO toggling

0 Kudos

1,829 Views
tharonhall
Contributor IV

Well, that's encouraging, since I am generating both 9600 and 38400 baud. What is really weird to us is that we have some bit bang code to do soft a soft SPI mater but the clock rate we are seeing is incredibly slow. At 120 MHz we might think we would have to slow it down, but it is only running at about 330 KHz. :smileysad: We would be happy at a few MHz really, and the slower clock cycle results in a visible effect to the end use. :smileysad:

0 Kudos

1,829 Views
tharonhall
Contributor IV

For what it's worth, here is the code:

//Keep old settings, then optimize our code

//#pragma GCC push_options

#pragma GCC optimize ("O3")

uint8_t SPI_send_byte_gpio(uint32_t head, uint8_t data)

{

  uint8_t counter;

  uint16_t pin_sck;

  uint16_t pin_mosi;

  uint16_t pin_miso;

  uint8_t receiveddata = 0;

  uint8_t miso_value;

  /*determine which gpio pins to use */

  switch( head)

  {

     case SPI_LEFT:

      pin_sck =  LEFT_SCK;

      pin_mosi = LEFT_MOSI;

      pin_miso = LEFT_MISO;

      break;

     case SPI_LEFT_CTR:

      pin_sck =  LEFT_CTR_SCK;

      pin_mosi = LEFT_CTR_MOSI;

      pin_miso = LEFT_CTR_MISO;

      break;

     case SPI_RIGHT_CTR:

      pin_sck =  RIGHT_CTR_SCK;

      pin_mosi = RIGHT_CTR_MOSI;

      pin_miso = RIGHT_CTR_MISO;

      break;

     case SPI_RIGHT:

      pin_sck =  RIGHT_SCK;

      pin_mosi = RIGHT_MOSI;

      pin_miso = RIGHT_MISO;

      break;

     default:

        return 0;

  }

  //Loop through each bit

  for(counter = 8; counter; counter--)

  {

    if (data  & 0x80)

    {

      /*MOSI = 1;*/

      GPIO_DRV_SetPinOutput(pin_mosi);

    }

    else

    {

      /*MOSI = 0;*/

      GPIO_DRV_ClearPinOutput(pin_mosi);

    }

    data <<= 1;

    /*SCK = 1;  a slave latches input data bit */

    GPIO_DRV_SetPinOutput(pin_sck);

    /* read MISO - assigned to a GPIO pin in processor expert */

    miso_value = GPIO_DRV_ReadPinInput( pin_miso);

    if (miso_value)

    {

      data |= 0x01;

    }

    /* SCK = 0;  a slave shifts out next output data bit */

    GPIO_DRV_ClearPinOutput(pin_sck);

  }/* end of for */

  /* return the received data */

  return(data);

}

//Restore all GCC options

//#pragma GCC pop_options

#pragma GCC optimize ("O0")

0 Kudos

1,829 Views
egoodii
Senior Contributor III

When you are trying to hit performance, it is important to look carefully at the assembly code for your loop.  Your 'pin...' vars may help, may hinder, the access time -- 32-bit-constant-loading on Cortex takes several cycles, but if the compiler holds each of these in fixed registers that would be a plus.  And I HOPE your GPIO_DRV_Set/Clear... items are simple macros that equate directly to GPIOx_PSOR and GPIOx_PCOR writes  You don't want CALL overhead here! (OR the compiler assumptions about register-killing!).  You WILL want to make 'counter' a 32-bit item; this is a 32 bit processor after all!  You might also get some benefit from making 'data'  32 bits and shifting left 24 to start, and the CPU can then use the Branch 'Plus'/'Minus' instructions directly, as your MSB test (now 0x80000000) for what to put-out.

0 Kudos

1,829 Views
tharonhall
Contributor IV

Thanks. Even if we could optimize the code, I am struggling understanding how each clock could take ~3 usec on a 120 MHz clock. BTW, I also can't get the hardware SPI above ~2.1 MHz, which makes me wonder if that is related to the same issue.

0 Kudos

1,829 Views
egoodii
Senior Contributor III

You can't ignore instruction efficiency.  If these 'bit ops' are truly function-calls, I can see each costing upwards of 50 cycles between actual call overhead, and inability of the compiler to optimize registers thru the calls, so together there are 'several hundred' right there.  You say you've confirmed a bus clock of 60MHz, so there is NO hardware reason the DSPI can't clock at 30MHz.  I use a 48MHz bus, and run 24MHz at startup to external SPI memory, 8MHz all the time to peripherals.  With proper FIFO usage, the SPI clock can be continuous.  If you're worried about Flash-access penalties, certainly you might want to 'force' the flash-fetch optimizations 'on', or tell your linker to copy & run this routine in SRAM (as a test at least).

Do you mean you haven't seen the DSPI SPI_CLK output faster than 2.1MHz, OR that you haven't seen >262Kbytes/s total transfer rate?

0 Kudos

1,829 Views
egoodii
Senior Contributor III

So if I make your routine more direct, and I have to make 'simple constants' for all your un-defined defines, and an assumption for GPIO 'B', I get this code:

uint8_t SPI_send_byte_gpio(uint32_t head, uint8_t data)

{

  uint32_t data32;

  uint32_t counter;

  uint32_t pin_sck;

  uint32_t pin_mosi;

  uint32_t pin_miso;

  uint32_t miso_value;

  //determine which gpio pins to use

  switch( head)

  {

     case 0:

      pin_sck =  1<<1;

      pin_mosi = 1<<2;

      pin_miso = 1<<3;

      break;

     case 1:

      pin_sck =  1<<4;

      pin_mosi = 1<<5;

      pin_miso = 1<<6;

      break;

     case 2:

      pin_sck =  1<<8;

      pin_mosi = 1<<9;

      pin_miso = 1<<10;

      break;

     case 3:

      pin_sck =  1<<11;

      pin_mosi = 1<<12;

      pin_miso = 1<<13;

      break;

     default:

        return 0;

  }

   data32 = data << 24;  //Justify to the top!

  //Loop through each bit

  for(counter = 8; counter; counter--)

  {

    if (data32  & 0x80000000)

    {

      /*MOSI = 1;*/

      GPIOB_PSOR = pin_mosi;

    }

    else

    {

      /*MOSI = 0;*/

      GPIOB_PCOR = pin_mosi;

    }

    data32 <<= 1;

    /*SCK = 1;  a slave latches input data bit */

    GPIOB_PSOR = pin_sck;

    /* read MISO - assigned to a GPIO pin in processor expert */

    miso_value = GPIOB_PDIR & pin_miso;

    if (miso_value)

    {

      data32 |= 0x01;

    }

    /* SCK = 0;  a slave shifts out next output data bit */

    GPIOB_PCOR = pin_sck;

  }/* end of for */

  /* return the received data */

  return((uint8_t)(data32 & 0xFF));

}

That compiles, with IAR at 'full optimize', into this code with a 16-instruction main-loop (the ??SPI_send_byte_gpio_6 and ??SPI_send_byte_gpio_7 groups).  Assuming 'some' cycles for the actual reads and writes, should be <30 instruction cycles, or 4MHz.

0x646e: 0xbd30    POP  {R4, R5, PC}

uint8_t SPI_send_byte_gpio(uint32_t head, uint8_t data)

{

SPI_send_byte_gpio:

0x6470: 0xb570    PUSH {R4-R6, LR}

  switch( head)

0x6472: 0xb128    CBZ  R0, ??SPI_send_byte_gpio_0 ; 0x6480
0x6474: 0x2802    CMP  R0, #2
0x6476: 0xd013    BEQ.N??SPI_send_byte_gpio_1  ; 0x64a0
0x6478: 0xd30e    BCC.N??SPI_send_byte_gpio_2  ; 0x6498
0x647a: 0x2803    CMP  R0, #3
0x647c: 0xd017    BEQ.N??SPI_send_byte_gpio_3  ; 0x64ae
0x647e: 0xe01d    B.N  ??SPI_send_byte_gpio_4  ; 0x64bc
pin_sck =  1<<1;

??SPI_send_byte_gpio_0:

0x6480: 0x2202    MOVS R2, #2
pin_mosi = 1<<2;
0x6482: 0x2304    MOVS R3, #4
pin_miso = 1<<3;
0x6484: 0x2408    MOVS R4, #8

   data32 = data << 24;  //Justify to the top!

??SPI_send_byte_gpio_5:

0x6486: 0x0608    LSLS R0, R1, #24

  for(counter = 8; counter; counter--)

0x6488: 0x2108    MOVS R1, #8
0x648a: 0xf8df 0x55ec  LDR.WR5, ??DataTable19_14; GPIOB_PSOR
if (data32  & 0x80000000)

??SPI_send_byte_gpio_6:

0x648e: 0x2800    CMP  R0, #0
0x6490: 0xbf4c    ITE  MI
0x6492: 0x602b    STRMIR3, [R5]
0x6494: 0x606b    STRPLR3, [R5, #0x4]
GPIOB_PSOR = pin_mosi;
0x6496: 0xe013    B.N  ??SPI_send_byte_gpio_7  ; 0x64c0
pin_sck =  1<<4;

??SPI_send_byte_gpio_2:

0x6498: 0x2210    MOVS R2, #16            ; 0x10
pin_mosi = 1<<5;
0x649a: 0x2320    MOVS R3, #32            ; 0x20
pin_miso = 1<<6;
0x649c: 0x2440    MOVS R4, #64            ; 0x40
break;
0x649e: 0xe7f2    B.N  ??SPI_send_byte_gpio_5  ; 0x6486
pin_sck =  1<<8;

??SPI_send_byte_gpio_1:

0x64a0: 0xf44f 0x7280  MOV.WR2, #256           ; 0x100
pin_mosi = 1<<9;
0x64a4: 0xf44f 0x7300  MOV.WR3, #512           ; 0x200
pin_miso = 1<<10;
0x64a8: 0xf44f 0x6480  MOV.WR4, #1024          ; 0x400
break;
0x64ac: 0xe7eb    B.N  ??SPI_send_byte_gpio_5  ; 0x6486
pin_sck =  1<<11;

??SPI_send_byte_gpio_3:

0x64ae: 0xf44f 0x6200  MOV.WR2, #2048          ; 0x800
pin_mosi = 1<<12;
0x64b2: 0xf44f 0x5380  MOV.WR3, #4096          ; 0x1000
pin_miso = 1<<13;
0x64b6: 0xf44f 0x5400  MOV.WR4, #8192          ; 0x2000
break;
0x64ba: 0xe7e4    B.N  ??SPI_send_byte_gpio_5  ; 0x6486
return 0;

??SPI_send_byte_gpio_4:

0x64bc: 0x2000    MOVS R0, #0
0x64be: 0xbd70    POP  {R4-R6, PC}
GPIOB_PSOR = pin_sck;

??SPI_send_byte_gpio_7:

0x64c0: 0x602a    STR  R2, [R5]
miso_value = GPIOB_PDIR & pin_miso;
0x64c2: 0x68ee    LDR  R6, [R5, #0xc]
0x64c4: 0x0040    LSLS R0, R0, #1
if (miso_value)
0x64c6: 0xb2b6    UXTH R6, R6
0x64c8: 0x4226    TST  R6, R4
0x64ca: 0xbf18    IT   NE
0x64cc: 0xf040 0x0001  ORRNE.W   R0, R0, #1
GPIOB_PCOR = pin_sck;
0x64d0: 0x606a    STR  R2, [R5, #0x4]

  for(counter = 8; counter; counter--)

0x64d2: 0x1e49    SUBS R1, R1, #1

  for(counter = 8; counter; counter--)

0x64d4: 0xd1db    BNE.N??SPI_send_byte_gpio_6  ; 0x648e

  return((uint8_t)(data32 & 0xFF));

0x64d6: 0xb2c0    UXTB R0, R0
0x64d8: 0xbd70    POP  {R4-R6, PC}
0x64da: 0x0000    MOVS R0, R0
0 Kudos

1,829 Views
tharonhall
Contributor IV

Thank you for the detailed feedback. I did some manual optimizations and was able to eliminate one call. It improved speed but not by much. I clearly have more work to do.

BTW, I answered in a different thread, but I was able to crank the SPI Flash up to 10 MHz and could go higher. The problem was that PEx didn't like some of the defined clock settings at the SPI frequency I selected even though I was only actually using one of the defined clock configurations.

0 Kudos

1,829 Views
egoodii
Senior Contributor III

Can you afford (within the 'bigger picture') to add the keyword 'inline' to the GPIO_DRV functions (OR change to Macros!) so they can be optimized directly in the loop?

0 Kudos

1,829 Views
ee-quipment_com
Contributor II

I always bring CLKOUT to a test point for just this reason. With the K22, you can't bring out the core clock itself, but you can bring out the Flash Clock which you have configured with a known-certain divider, probably divide-by-5. The attachment shows how I have my K22 configured for 120 MHz operation driving the 24 MHz flash clk on CLKOUT.

Sorry for the attachment, but the forum seems to insist on html which ruins my formatting.

1,829 Views
tharonhall
Contributor IV

Now, this may warrant a new thread, but one thing I can't make any sense of is the Flash settings. It may be funny because the debugger has done something? I haven't been able to identify anything in PEx, or the code, that is changing the values from the default. However, according to EmbSys, the registers have changed from the reset values and they are all suspiciously the same number. Flash settings have to be changed from a RAM routine, so it is non-trivial to change them. See below:

FMC registers.png

Any clue what the different "masters" are? Obviously the most critical would be the Cortex-M4 core from a performance standpoint. It is less clear to me what other masters would be accessing the Flash.

0 Kudos

1,829 Views
tharonhall
Contributor IV

PS - I noticed after I posted this that another time when I was debugging the values were different and were at least the same or closer to the defaults, putting doubt on the meaningfulness of the previous snapshot. :smileysilly:

0 Kudos

1,829 Views
tharonhall
Contributor IV

So dumb question: The Cortex-M runs in thumb mode so it wants 16-bit opcodes, yes? Is it setup so that the Flash reads multiple opcodes over a wide bus so the core doesn't stall, or is 120 MHz more theoretical than practical? What is the sustained clock rate of the K22F core under nominal conditions, assuming no branching, context switches, etc.?

0 Kudos