Does anyone know a surefire way to determine with absolute certainty that a K22F is, in fact, clocking at 120 MHz? We are seeing some curious timing and one possible explanation could be that the system is running much slower than it should and want to be certain we are at the correct clock speed.
Thanks!
Tharon
To verify the system clock you can configure UART0 (or 1) and check that its Baud is as expected.
This is because these UARTs are clocked directly by the system clock and so the UART speed give an indirect confirmation of it.
Regards
Mark
Kinetis: µTasker Kinetis support
K22: µTasker Kinetis FRDM-K22F support / µTasker Kinetis TWR-K22F120M support
For the complete "out-of-the-box" Kinetis experience and faster time to market
For what it's worth, I looked through the MCG and SIM registers with a fine tooth comb and everything looks perfect with the exception that the FlexBus clock frequency is too high, but since we don't have an external bus I don't think that matters anyway.
Hi
Try toggling a pin with
GPIOx_PTOR = pin_bit; // eg. GPIOA_PTOR = 0x00000008 for PTA3
GPIOx_PTOR = pin_bit;
GPIOx_PTOR = pin_bit;
GPIOx_PTOR = pin_bit;
GPIOx_PTOR = pin_bit;
GPIOx_PTOR = pin_bit;
GPIOx_PTOR = pin_bit;
GPIOx_PTOR = pin_bit;
GPIOx_PTOR = pin_bit;
- you should get 50..60MHz generated.
If you do, the internals are running at the expected speed and you will need to study the code generated by the compiler as suggested by Earl (make sure that you have optimisation enabled of course).
Regards
Mark
Kinetis: µTasker Kinetis support
K22: µTasker Kinetis FRDM-K22F support / µTasker Kinetis TWR-K22F120M support
For the complete "out-of-the-box" Kinetis experience and faster time to market
Nothing is quite 'that straightforward' in a complex processor with pre-fetch, arbiters, pipelines, write-buffers, and the like. I put in a list of 5, and the 'whole following group' of 5 32-bit instruction-words runs 125ns (12 clocks at 96MHz, so 2 cycles each plus 2 'delay'), BUT not in a 'regular' way!
GPIOC_PTOR = 1<<16;
0x64dc: 0xf8df 0x0550 LDR.W R0, ??DataTable18_17 ; GPIOC_PTOR
0x64e0: 0xf8c0 0xb000 STR.W R11, [R0]
GPIOC_PTOR = 1<<16;
0x64e4: 0xf8c0 0xb000 STR.W R11, [R0]
GPIOC_PTOR = 1<<16;
0x64e8: 0xf8c0 0xb000 STR.W R11, [R0]
GPIOC_PTOR = 1<<16;
0x64ec: 0xf8c0 0xb000 STR.W R11, [R0]
GPIOC_PTOR = 1<<16;
0x64f0: 0xf8c0 0xb000 STR.W R11, [R0]
Earl
See also the following: Re: Fast GPIO on Kinetis KF22
I tested once with 100 toggles and then it was very regular.
The idea was however to get another 'confirmation' of the general ball-park to ensure that there is no large unexpected deviation. With caching and such it is not necessarily the way to generate a 50/50 square wave.
Regards
Mark
Kinetis: µTasker Kinetis support
K22: µTasker Kinetis FRDM-K22F support / µTasker Kinetis TWR-K22F120M support
For the complete "out-of-the-box" Kinetis experience and faster time to market
Agreed -- when I get the compiler to create single-instruction writes, I get a 48MHz square wave.
So the outstanding question is whether this user can get the GCC tools to put out a software-SPI loop anything like the one I get from IAR, and if THAT runs at 'multi megahertz' speeds?
Earl, thanks for sharing your solution for bitbanged SPI. I'll be revisiting that subject once I get through some other tasks. I'll post up my results for the K64 and K22 then in my thread Abysmally-slow IO toggling
Well, that's encouraging, since I am generating both 9600 and 38400 baud. What is really weird to us is that we have some bit bang code to do soft a soft SPI mater but the clock rate we are seeing is incredibly slow. At 120 MHz we might think we would have to slow it down, but it is only running at about 330 KHz. :smileysad: We would be happy at a few MHz really, and the slower clock cycle results in a visible effect to the end use. :smileysad:
For what it's worth, here is the code:
//Keep old settings, then optimize our code
//#pragma GCC push_options
#pragma GCC optimize ("O3")
uint8_t SPI_send_byte_gpio(uint32_t head, uint8_t data)
{
uint8_t counter;
uint16_t pin_sck;
uint16_t pin_mosi;
uint16_t pin_miso;
uint8_t receiveddata = 0;
uint8_t miso_value;
/*determine which gpio pins to use */
switch( head)
{
case SPI_LEFT:
pin_sck = LEFT_SCK;
pin_mosi = LEFT_MOSI;
pin_miso = LEFT_MISO;
break;
case SPI_LEFT_CTR:
pin_sck = LEFT_CTR_SCK;
pin_mosi = LEFT_CTR_MOSI;
pin_miso = LEFT_CTR_MISO;
break;
case SPI_RIGHT_CTR:
pin_sck = RIGHT_CTR_SCK;
pin_mosi = RIGHT_CTR_MOSI;
pin_miso = RIGHT_CTR_MISO;
break;
case SPI_RIGHT:
pin_sck = RIGHT_SCK;
pin_mosi = RIGHT_MOSI;
pin_miso = RIGHT_MISO;
break;
default:
return 0;
}
//Loop through each bit
for(counter = 8; counter; counter--)
{
if (data & 0x80)
{
/*MOSI = 1;*/
GPIO_DRV_SetPinOutput(pin_mosi);
}
else
{
/*MOSI = 0;*/
GPIO_DRV_ClearPinOutput(pin_mosi);
}
data <<= 1;
/*SCK = 1; a slave latches input data bit */
GPIO_DRV_SetPinOutput(pin_sck);
/* read MISO - assigned to a GPIO pin in processor expert */
miso_value = GPIO_DRV_ReadPinInput( pin_miso);
if (miso_value)
{
data |= 0x01;
}
/* SCK = 0; a slave shifts out next output data bit */
GPIO_DRV_ClearPinOutput(pin_sck);
}/* end of for */
/* return the received data */
return(data);
}
//Restore all GCC options
//#pragma GCC pop_options
#pragma GCC optimize ("O0")
When you are trying to hit performance, it is important to look carefully at the assembly code for your loop. Your 'pin...' vars may help, may hinder, the access time -- 32-bit-constant-loading on Cortex takes several cycles, but if the compiler holds each of these in fixed registers that would be a plus. And I HOPE your GPIO_DRV_Set/Clear... items are simple macros that equate directly to GPIOx_PSOR and GPIOx_PCOR writes You don't want CALL overhead here! (OR the compiler assumptions about register-killing!). You WILL want to make 'counter' a 32-bit item; this is a 32 bit processor after all! You might also get some benefit from making 'data' 32 bits and shifting left 24 to start, and the CPU can then use the Branch 'Plus'/'Minus' instructions directly, as your MSB test (now 0x80000000) for what to put-out.
Thanks. Even if we could optimize the code, I am struggling understanding how each clock could take ~3 usec on a 120 MHz clock. BTW, I also can't get the hardware SPI above ~2.1 MHz, which makes me wonder if that is related to the same issue.
You can't ignore instruction efficiency. If these 'bit ops' are truly function-calls, I can see each costing upwards of 50 cycles between actual call overhead, and inability of the compiler to optimize registers thru the calls, so together there are 'several hundred' right there. You say you've confirmed a bus clock of 60MHz, so there is NO hardware reason the DSPI can't clock at 30MHz. I use a 48MHz bus, and run 24MHz at startup to external SPI memory, 8MHz all the time to peripherals. With proper FIFO usage, the SPI clock can be continuous. If you're worried about Flash-access penalties, certainly you might want to 'force' the flash-fetch optimizations 'on', or tell your linker to copy & run this routine in SRAM (as a test at least).
Do you mean you haven't seen the DSPI SPI_CLK output faster than 2.1MHz, OR that you haven't seen >262Kbytes/s total transfer rate?
So if I make your routine more direct, and I have to make 'simple constants' for all your un-defined defines, and an assumption for GPIO 'B', I get this code:
uint8_t SPI_send_byte_gpio(uint32_t head, uint8_t data)
{
uint32_t data32;
uint32_t counter;
uint32_t pin_sck;
uint32_t pin_mosi;
uint32_t pin_miso;
uint32_t miso_value;
//determine which gpio pins to use
switch( head)
{
case 0:
pin_sck = 1<<1;
pin_mosi = 1<<2;
pin_miso = 1<<3;
break;
case 1:
pin_sck = 1<<4;
pin_mosi = 1<<5;
pin_miso = 1<<6;
break;
case 2:
pin_sck = 1<<8;
pin_mosi = 1<<9;
pin_miso = 1<<10;
break;
case 3:
pin_sck = 1<<11;
pin_mosi = 1<<12;
pin_miso = 1<<13;
break;
default:
return 0;
}
data32 = data << 24; //Justify to the top!
//Loop through each bit
for(counter = 8; counter; counter--)
{
if (data32 & 0x80000000)
{
/*MOSI = 1;*/
GPIOB_PSOR = pin_mosi;
}
else
{
/*MOSI = 0;*/
GPIOB_PCOR = pin_mosi;
}
data32 <<= 1;
/*SCK = 1; a slave latches input data bit */
GPIOB_PSOR = pin_sck;
/* read MISO - assigned to a GPIO pin in processor expert */
miso_value = GPIOB_PDIR & pin_miso;
if (miso_value)
{
data32 |= 0x01;
}
/* SCK = 0; a slave shifts out next output data bit */
GPIOB_PCOR = pin_sck;
}/* end of for */
/* return the received data */
return((uint8_t)(data32 & 0xFF));
}
That compiles, with IAR at 'full optimize', into this code with a 16-instruction main-loop (the ??SPI_send_byte_gpio_6 and ??SPI_send_byte_gpio_7 groups). Assuming 'some' cycles for the actual reads and writes, should be <30 instruction cycles, or 4MHz.
0x646e: 0xbd30 | POP | {R4, R5, PC} |
uint8_t SPI_send_byte_gpio(uint32_t head, uint8_t data)
{
SPI_send_byte_gpio:
0x6470: 0xb570 | PUSH | {R4-R6, LR} |
switch( head)
0x6472: 0xb128 | CBZ | R0, ??SPI_send_byte_gpio_0 ; 0x6480 | |
0x6474: 0x2802 | CMP | R0, #2 | |
0x6476: 0xd013 | BEQ.N | ??SPI_send_byte_gpio_1 ; 0x64a0 | |
0x6478: 0xd30e | BCC.N | ??SPI_send_byte_gpio_2 ; 0x6498 | |
0x647a: 0x2803 | CMP | R0, #3 | |
0x647c: 0xd017 | BEQ.N | ??SPI_send_byte_gpio_3 ; 0x64ae | |
0x647e: 0xe01d | B.N | ??SPI_send_byte_gpio_4 ; 0x64bc | |
pin_sck = 1<<1; |
??SPI_send_byte_gpio_0:
0x6480: 0x2202 | MOVS | R2, #2 | |
pin_mosi = 1<<2; | |||
0x6482: 0x2304 | MOVS | R3, #4 | |
pin_miso = 1<<3; | |||
0x6484: 0x2408 | MOVS | R4, #8 |
data32 = data << 24; //Justify to the top!
??SPI_send_byte_gpio_5:
0x6486: 0x0608 | LSLS | R0, R1, #24 |
for(counter = 8; counter; counter--)
0x6488: 0x2108 | MOVS | R1, #8 | |
0x648a: 0xf8df 0x55ec LDR.W | R5, ??DataTable19_14 | ; GPIOB_PSOR | |
if (data32 & 0x80000000) |
??SPI_send_byte_gpio_6:
0x648e: 0x2800 | CMP | R0, #0 | |
0x6490: 0xbf4c | ITE | MI | |
0x6492: 0x602b | STRMI | R3, [R5] | |
0x6494: 0x606b | STRPL | R3, [R5, #0x4] | |
GPIOB_PSOR = pin_mosi; | |||
0x6496: 0xe013 | B.N | ??SPI_send_byte_gpio_7 ; 0x64c0 | |
pin_sck = 1<<4; |
??SPI_send_byte_gpio_2:
0x6498: 0x2210 | MOVS | R2, #16 | ; 0x10 | |
pin_mosi = 1<<5; | ||||
0x649a: 0x2320 | MOVS | R3, #32 | ; 0x20 | |
pin_miso = 1<<6; | ||||
0x649c: 0x2440 | MOVS | R4, #64 | ; 0x40 | |
break; |
0x649e: 0xe7f2 | B.N | ??SPI_send_byte_gpio_5 ; 0x6486 | |
pin_sck = 1<<8; |
??SPI_send_byte_gpio_1:
0x64a0: 0xf44f 0x7280 MOV.W | R2, #256 | ; 0x100 | |
pin_mosi = 1<<9; | |||
0x64a4: 0xf44f 0x7300 MOV.W | R3, #512 | ; 0x200 | |
pin_miso = 1<<10; | |||
0x64a8: 0xf44f 0x6480 MOV.W | R4, #1024 | ; 0x400 | |
break; | |||
0x64ac: 0xe7eb | B.N | ??SPI_send_byte_gpio_5 ; 0x6486 | |
pin_sck = 1<<11; |
??SPI_send_byte_gpio_3:
0x64ae: 0xf44f 0x6200 MOV.W | R2, #2048 | ; 0x800 | |
pin_mosi = 1<<12; | |||
0x64b2: 0xf44f 0x5380 MOV.W | R3, #4096 | ; 0x1000 | |
pin_miso = 1<<13; | |||
0x64b6: 0xf44f 0x5400 MOV.W | R4, #8192 | ; 0x2000 | |
break; | |||
0x64ba: 0xe7e4 | B.N | ??SPI_send_byte_gpio_5 ; 0x6486 | |
return 0; |
??SPI_send_byte_gpio_4:
0x64bc: 0x2000 | MOVS | R0, #0 | |
0x64be: 0xbd70 | POP | {R4-R6, PC} | |
GPIOB_PSOR = pin_sck; |
??SPI_send_byte_gpio_7:
0x64c0: 0x602a | STR | R2, [R5] | |
miso_value = GPIOB_PDIR & pin_miso; | |||
0x64c2: 0x68ee | LDR | R6, [R5, #0xc] | |
0x64c4: 0x0040 | LSLS | R0, R0, #1 | |
if (miso_value) | |||
0x64c6: 0xb2b6 | UXTH | R6, R6 | |
0x64c8: 0x4226 | TST | R6, R4 | |
0x64ca: 0xbf18 | IT | NE | |
0x64cc: 0xf040 0x0001 | ORRNE.W R0, R0, #1 | ||
GPIOB_PCOR = pin_sck; | |||
0x64d0: 0x606a | STR | R2, [R5, #0x4] |
for(counter = 8; counter; counter--)
0x64d2: 0x1e49 | SUBS | R1, R1, #1 |
for(counter = 8; counter; counter--)
0x64d4: 0xd1db | BNE.N | ??SPI_send_byte_gpio_6 ; 0x648e |
return((uint8_t)(data32 & 0xFF));
0x64d6: 0xb2c0 | UXTB | R0, R0 | |
0x64d8: 0xbd70 | POP | {R4-R6, PC} | |
0x64da: 0x0000 | MOVS | R0, R0 |
Thank you for the detailed feedback. I did some manual optimizations and was able to eliminate one call. It improved speed but not by much. I clearly have more work to do.
BTW, I answered in a different thread, but I was able to crank the SPI Flash up to 10 MHz and could go higher. The problem was that PEx didn't like some of the defined clock settings at the SPI frequency I selected even though I was only actually using one of the defined clock configurations.
Can you afford (within the 'bigger picture') to add the keyword 'inline' to the GPIO_DRV functions (OR change to Macros!) so they can be optimized directly in the loop?
I always bring CLKOUT to a test point for just this reason. With the K22, you can't bring out the core clock itself, but you can bring out the Flash Clock which you have configured with a known-certain divider, probably divide-by-5. The attachment shows how I have my K22 configured for 120 MHz operation driving the 24 MHz flash clk on CLKOUT.
Sorry for the attachment, but the forum seems to insist on html which ruins my formatting.
Now, this may warrant a new thread, but one thing I can't make any sense of is the Flash settings. It may be funny because the debugger has done something? I haven't been able to identify anything in PEx, or the code, that is changing the values from the default. However, according to EmbSys, the registers have changed from the reset values and they are all suspiciously the same number. Flash settings have to be changed from a RAM routine, so it is non-trivial to change them. See below:
Any clue what the different "masters" are? Obviously the most critical would be the Cortex-M4 core from a performance standpoint. It is less clear to me what other masters would be accessing the Flash.
PS - I noticed after I posted this that another time when I was debugging the values were different and were at least the same or closer to the defaults, putting doubt on the meaningfulness of the previous snapshot. :smileysilly:
So dumb question: The Cortex-M runs in thumb mode so it wants 16-bit opcodes, yes? Is it setup so that the Flash reads multiple opcodes over a wide bus so the core doesn't stall, or is 120 MHz more theoretical than practical? What is the sustained clock rate of the K22F core under nominal conditions, assuming no branching, context switches, etc.?