Confirming K22F Clock Frequency

tharonhall · ‎04-21-2015

Does anyone know a surefire way to determine with absolute certainty that a K22F is, in fact, clocking at 120 MHz? We are seeing some curious timing and one possible explanation could be that the system is running much slower than it should and want to be certain we are at the correct clock speed.

Thanks!

mjbcswitzerland · ‎04-21-2015

Tharon

To verify the system clock you can configure UART0 (or 1) and check that its Baud is as expected.

This is because these UARTs are clocked directly by the system clock and so the UART speed give an indirect confirmation of it.

Regards

Mark

Kinetis: µTasker Kinetis support

K22: µTasker Kinetis FRDM-K22F support / µTasker Kinetis TWR-K22F120M support

For the complete "out-of-the-box" Kinetis experience and faster time to market

tharonhall · ‎04-22-2015

For what it's worth, I looked through the MCG and SIM registers with a fine tooth comb and everything looks perfect with the exception that the FlexBus clock frequency is too high, but since we don't have an external bus I don't think that matters anyway.

mjbcswitzerland · ‎04-22-2015

Hi

Try toggling a pin with

GPIOx_PTOR = pin_bit; // eg. GPIOA_PTOR = 0x00000008 for PTA3

GPIOx_PTOR = pin_bit;

- you should get 50..60MHz generated.

If you do, the internals are running at the expected speed and you will need to study the code generated by the compiler as suggested by Earl (make sure that you have optimisation enabled of course).

Regards

Mark

Kinetis: µTasker Kinetis support

K22: µTasker Kinetis FRDM-K22F support / µTasker Kinetis TWR-K22F120M support

For the complete "out-of-the-box" Kinetis experience and faster time to market

egoodii · ‎04-22-2015

Nothing is quite 'that straightforward' in a complex processor with pre-fetch, arbiters, pipelines, write-buffers, and the like. I put in a list of 5, and the 'whole following group' of 5 32-bit instruction-words runs 125ns (12 clocks at 96MHz, so 2 cycles each plus 2 'delay'), BUT not in a 'regular' way!

GPIOC_PTOR = 1<<16;

0x64dc: 0xf8df 0x0550 LDR.W R0, ??DataTable18_17 ; GPIOC_PTOR

0x64e0: 0xf8c0 0xb000 STR.W R11, [R0]

GPIOC_PTOR = 1<<16;

0x64e4: 0xf8c0 0xb000 STR.W R11, [R0]

GPIOC_PTOR = 1<<16;

0x64e8: 0xf8c0 0xb000 STR.W R11, [R0]

GPIOC_PTOR = 1<<16;

0x64ec: 0xf8c0 0xb000 STR.W R11, [R0]

GPIOC_PTOR = 1<<16;

0x64f0: 0xf8c0 0xb000 STR.W R11, [R0]

mjbcswitzerland · ‎04-22-2015

Earl

See also the following: Re: Fast GPIO on Kinetis KF22

I tested once with 100 toggles and then it was very regular.

The idea was however to get another 'confirmation' of the general ball-park to ensure that there is no large unexpected deviation. With caching and such it is not necessarily the way to generate a 50/50 square wave.

Regards

Mark

Kinetis: µTasker Kinetis support

K22: µTasker Kinetis FRDM-K22F support / µTasker Kinetis TWR-K22F120M support

For the complete "out-of-the-box" Kinetis experience and faster time to market

egoodii · ‎04-23-2015

Agreed -- when I get the compiler to create single-instruction writes, I get a 48MHz square wave.

So the outstanding question is whether this user can get the GCC tools to put out a software-SPI loop anything like the one I get from IAR, and if THAT runs at 'multi megahertz' speeds?

dave408 · ‎05-04-2015

Earl, thanks for sharing your solution for bitbanged SPI. I'll be revisiting that subject once I get through some other tasks. I'll post up my results for the K64 and K22 then in my thread Abysmally-slow IO toggling

tharonhall · ‎04-22-2015

Well, that's encouraging, since I am generating both 9600 and 38400 baud. What is really weird to us is that we have some bit bang code to do soft a soft SPI mater but the clock rate we are seeing is incredibly slow. At 120 MHz we might think we would have to slow it down, but it is only running at about 330 KHz. :smileysad: We would be happy at a few MHz really, and the slower clock cycle results in a visible effect to the end use. :smileysad:

tharonhall · ‎04-22-2015

For what it's worth, here is the code:

//Keep old settings, then optimize our code

//#pragma GCC push_options

#pragma GCC optimize ("O3")

uint8_t SPI_send_byte_gpio(uint32_t head, uint8_t data)

{

uint8_t counter;

uint16_t pin_sck;

uint16_t pin_mosi;

uint16_t pin_miso;

uint8_t receiveddata = 0;

uint8_t miso_value;

/*determine which gpio pins to use */

switch( head)

{

case SPI_LEFT:

pin_sck = LEFT_SCK;

pin_mosi = LEFT_MOSI;

pin_miso = LEFT_MISO;

break;

case SPI_LEFT_CTR:

pin_sck = LEFT_CTR_SCK;

pin_mosi = LEFT_CTR_MOSI;

pin_miso = LEFT_CTR_MISO;

break;

case SPI_RIGHT_CTR:

pin_sck = RIGHT_CTR_SCK;

pin_mosi = RIGHT_CTR_MOSI;

pin_miso = RIGHT_CTR_MISO;

break;

case SPI_RIGHT:

pin_sck = RIGHT_SCK;

pin_mosi = RIGHT_MOSI;

pin_miso = RIGHT_MISO;

break;

default:

return 0;

}

//Loop through each bit

for(counter = 8; counter; counter--)

{

if (data & 0x80)

{

/*MOSI = 1;*/

GPIO_DRV_SetPinOutput(pin_mosi);

}

else

{

/*MOSI = 0;*/

GPIO_DRV_ClearPinOutput(pin_mosi);

}

data <<= 1;

/*SCK = 1; a slave latches input data bit */

GPIO_DRV_SetPinOutput(pin_sck);

/* read MISO - assigned to a GPIO pin in processor expert */

miso_value = GPIO_DRV_ReadPinInput( pin_miso);

if (miso_value)

{

data |= 0x01;

}

/* SCK = 0; a slave shifts out next output data bit */

GPIO_DRV_ClearPinOutput(pin_sck);

}/* end of for */

/* return the received data */

return(data);

}

//Restore all GCC options

//#pragma GCC pop_options

#pragma GCC optimize ("O0")

egoodii · ‎04-22-2015

When you are trying to hit performance, it is important to look carefully at the assembly code for your loop. Your 'pin...' vars may help, may hinder, the access time -- 32-bit-constant-loading on Cortex takes several cycles, but if the compiler holds each of these in fixed registers that would be a plus. And I HOPE your GPIO_DRV_Set/Clear... items are simple macros that equate directly to GPIOx_PSOR and GPIOx_PCOR writes You don't want CALL overhead here! (OR the compiler assumptions about register-killing!). You WILL want to make 'counter' a 32-bit item; this is a 32 bit processor after all! You might also get some benefit from making 'data' 32 bits and shifting left 24 to start, and the CPU can then use the Branch 'Plus'/'Minus' instructions directly, as your MSB test (now 0x80000000) for what to put-out.

tharonhall · ‎04-22-2015

Thanks. Even if we could optimize the code, I am struggling understanding how each clock could take ~3 usec on a 120 MHz clock. BTW, I also can't get the hardware SPI above ~2.1 MHz, which makes me wonder if that is related to the same issue.

egoodii · ‎04-22-2015

You can't ignore instruction efficiency. If these 'bit ops' are truly function-calls, I can see each costing upwards of 50 cycles between actual call overhead, and inability of the compiler to optimize registers thru the calls, so together there are 'several hundred' right there. You say you've confirmed a bus clock of 60MHz, so there is NO hardware reason the DSPI can't clock at 30MHz. I use a 48MHz bus, and run 24MHz at startup to external SPI memory, 8MHz all the time to peripherals. With proper FIFO usage, the SPI clock can be continuous. If you're worried about Flash-access penalties, certainly you might want to 'force' the flash-fetch optimizations 'on', or tell your linker to copy & run this routine in SRAM (as a test at least).

Do you mean you haven't seen the DSPI SPI_CLK output faster than 2.1MHz, OR that you haven't seen >262Kbytes/s total transfer rate?

egoodii · ‎04-22-2015

So if I make your routine more direct, and I have to make 'simple constants' for all your un-defined defines, and an assumption for GPIO 'B', I get this code:

uint8_t SPI_send_byte_gpio(uint32_t head, uint8_t data)

{

uint32_t data32;

uint32_t counter;

uint32_t pin_sck;

uint32_t pin_mosi;

uint32_t pin_miso;

uint32_t miso_value;

//determine which gpio pins to use

switch( head)

{

case 0:

pin_sck = 1<<1;

pin_mosi = 1<<2;

pin_miso = 1<<3;

break;

case 1:

pin_sck = 1<<4;

pin_mosi = 1<<5;

pin_miso = 1<<6;

break;

case 2:

pin_sck = 1<<8;

pin_mosi = 1<<9;

pin_miso = 1<<10;

break;

case 3:

pin_sck = 1<<11;

pin_mosi = 1<<12;

pin_miso = 1<<13;

break;

default:

return 0;

}

data32 = data << 24; //Justify to the top!

//Loop through each bit

for(counter = 8; counter; counter--)

{

if (data32 & 0x80000000)

{

/*MOSI = 1;*/

GPIOB_PSOR = pin_mosi;

}

else

{

/*MOSI = 0;*/

GPIOB_PCOR = pin_mosi;

}

data32 <<= 1;

/*SCK = 1; a slave latches input data bit */

GPIOB_PSOR = pin_sck;

/* read MISO - assigned to a GPIO pin in processor expert */

miso_value = GPIOB_PDIR & pin_miso;

if (miso_value)

{

data32 |= 0x01;

}

/* SCK = 0; a slave shifts out next output data bit */

GPIOB_PCOR = pin_sck;

}/* end of for */

/* return the received data */

return((uint8_t)(data32 & 0xFF));

}

That compiles, with IAR at 'full optimize', into this code with a 16-instruction main-loop (the ??SPI_send_byte_gpio_6 and ??SPI_send_byte_gpio_7 groups). Assuming 'some' cycles for the actual reads and writes, should be <30 instruction cycles, or 4MHz.

0x646e: 0xbd30

POP

{R4, R5, PC}

uint8_t SPI_send_byte_gpio(uint32_t head, uint8_t data)

{

SPI_send_byte_gpio:

0x6470: 0xb570

PUSH

{R4-R6, LR}

switch( head)

0x6472: 0xb128	CBZ	R0, ??SPI_send_byte_gpio_0 ; 0x6480
0x6474: 0x2802	CMP	R0, #2
0x6476: 0xd013	BEQ.N	??SPI_send_byte_gpio_1 ; 0x64a0
0x6478: 0xd30e	BCC.N	??SPI_send_byte_gpio_2 ; 0x6498
0x647a: 0x2803	CMP	R0, #3
0x647c: 0xd017	BEQ.N	??SPI_send_byte_gpio_3 ; 0x64ae
0x647e: 0xe01d	B.N	??SPI_send_byte_gpio_4 ; 0x64bc
pin_sck = 1<<1;

??SPI_send_byte_gpio_0:

0x6480: 0x2202	MOVS	R2, #2
pin_mosi = 1<<2;
0x6482: 0x2304	MOVS	R3, #4
pin_miso = 1<<3;
0x6484: 0x2408	MOVS	R4, #8

data32 = data << 24; //Justify to the top!

??SPI_send_byte_gpio_5:

0x6486: 0x0608

LSLS

R0, R1, #24

for(counter = 8; counter; counter--)

0x6488: 0x2108	MOVS	R1, #8
0x648a: 0xf8df 0x55ec LDR.W	R5, ??DataTable19_14	; GPIOB_PSOR
if (data32 & 0x80000000)

??SPI_send_byte_gpio_6:

0x648e: 0x2800	CMP	R0, #0
0x6490: 0xbf4c	ITE	MI
0x6492: 0x602b	STRMI	R3, [R5]
0x6494: 0x606b	STRPL	R3, [R5, #0x4]
GPIOB_PSOR = pin_mosi;
0x6496: 0xe013	B.N	??SPI_send_byte_gpio_7 ; 0x64c0
pin_sck = 1<<4;

??SPI_send_byte_gpio_2:

0x6498: 0x2210	MOVS	R2, #16	; 0x10
pin_mosi = 1<<5;
0x649a: 0x2320	MOVS	R3, #32	; 0x20
pin_miso = 1<<6;
0x649c: 0x2440	MOVS	R4, #64	; 0x40
break;

	0x649e: 0xe7f2	B.N	??SPI_send_byte_gpio_5 ; 0x6486
	pin_sck = 1<<8;

??SPI_send_byte_gpio_1:

0x64a0: 0xf44f 0x7280 MOV.W	R2, #256	; 0x100
pin_mosi = 1<<9;
0x64a4: 0xf44f 0x7300 MOV.W	R3, #512	; 0x200
pin_miso = 1<<10;
0x64a8: 0xf44f 0x6480 MOV.W	R4, #1024	; 0x400
break;
0x64ac: 0xe7eb	B.N	??SPI_send_byte_gpio_5 ; 0x6486
pin_sck = 1<<11;

??SPI_send_byte_gpio_3:

0x64ae: 0xf44f 0x6200 MOV.W	R2, #2048	; 0x800
pin_mosi = 1<<12;
0x64b2: 0xf44f 0x5380 MOV.W	R3, #4096	; 0x1000
pin_miso = 1<<13;
0x64b6: 0xf44f 0x5400 MOV.W	R4, #8192	; 0x2000
break;
0x64ba: 0xe7e4	B.N	??SPI_send_byte_gpio_5 ; 0x6486
return 0;

??SPI_send_byte_gpio_4:

0x64bc: 0x2000	MOVS	R0, #0
0x64be: 0xbd70	POP	{R4-R6, PC}
GPIOB_PSOR = pin_sck;

??SPI_send_byte_gpio_7:

0x64c0: 0x602a	STR	R2, [R5]
miso_value = GPIOB_PDIR & pin_miso;
0x64c2: 0x68ee	LDR	R6, [R5, #0xc]
0x64c4: 0x0040	LSLS	R0, R0, #1
if (miso_value)
0x64c6: 0xb2b6	UXTH	R6, R6
0x64c8: 0x4226	TST	R6, R4
0x64ca: 0xbf18	IT	NE
0x64cc: 0xf040 0x0001	ORRNE.W R0, R0, #1
GPIOB_PCOR = pin_sck;
0x64d0: 0x606a	STR	R2, [R5, #0x4]

for(counter = 8; counter; counter--)

0x64d2: 0x1e49

SUBS

R1, R1, #1

for(counter = 8; counter; counter--)

0x64d4: 0xd1db

BNE.N

??SPI_send_byte_gpio_6 ; 0x648e

return((uint8_t)(data32 & 0xFF));

0x64d6: 0xb2c0	UXTB	R0, R0
0x64d8: 0xbd70	POP	{R4-R6, PC}
0x64da: 0x0000	MOVS	R0, R0

tharonhall · ‎04-23-2015

Thank you for the detailed feedback. I did some manual optimizations and was able to eliminate one call. It improved speed but not by much. I clearly have more work to do.

BTW, I answered in a different thread, but I was able to crank the SPI Flash up to 10 MHz and could go higher. The problem was that PEx didn't like some of the defined clock settings at the SPI frequency I selected even though I was only actually using one of the defined clock configurations.

egoodii · ‎04-24-2015

Can you afford (within the 'bigger picture') to add the keyword 'inline' to the GPIO_DRV functions (OR change to Macros!) so they can be optimized directly in the loop?

ee-quipment_com · ‎04-21-2015

I always bring CLKOUT to a test point for just this reason. With the K22, you can't bring out the core clock itself, but you can bring out the Flash Clock which you have configured with a known-certain divider, probably divide-by-5. The attachment shows how I have my K22 configured for 120 MHz operation driving the 24 MHz flash clk on CLKOUT.

Sorry for the attachment, but the forum seems to insist on html which ruins my formatting.

tharonhall · ‎04-22-2015

Now, this may warrant a new thread, but one thing I can't make any sense of is the Flash settings. It may be funny because the debugger has done something? I haven't been able to identify anything in PEx, or the code, that is changing the values from the default. However, according to EmbSys, the registers have changed from the reset values and they are all suspiciously the same number. Flash settings have to be changed from a RAM routine, so it is non-trivial to change them. See below:

Any clue what the different "masters" are? Obviously the most critical would be the Cortex-M4 core from a performance standpoint. It is less clear to me what other masters would be accessing the Flash.

tharonhall · ‎04-22-2015

PS - I noticed after I posted this that another time when I was debugging the values were different and were at least the same or closer to the defaults, putting doubt on the meaningfulness of the previous snapshot. :smileysilly:

tharonhall · ‎04-22-2015

So dumb question: The Cortex-M runs in thumb mode so it wants 16-bit opcodes, yes? Is it setup so that the Flash reads multiple opcodes over a wide bus so the core doesn't stall, or is 120 MHz more theoretical than practical? What is the sustained clock rate of the K22F core under nominal conditions, assuming no branching, context switches, etc.?

Confirming K22F Clock Frequency

Confirming K22F Clock Frequency

Kinetis K Series MCUs