DSPI_MasterTransferEDMA has huge overhead

gerhardheinzel · ‎03-12-2021

On a K64F, I am using SPI0 with DMA (TX only) to send data to a display at 10 MHz. That happens many times with always the same parameters. I find that DSPI_MasterTransferEDMA is quite slow.

From calling the function to the actual start of data transfer it takes 11.2 µs (clock running at full speed, release mode with optimizations). Looking at the code I find a lot of overhead such as

uint32_t instance = DSPI_GetInstance(base);

which is necessary in the very general case but wasteful for me. Is there any way to speed up the process?

gerhardheinzel · ‎03-12-2021

Once the SPI gets going, the data rate is ok (10 MHz), But it takes a long time from calling DSPI_MasterTransferEDMA before the SPI starts to do anything.

myke_predko · ‎03-12-2021

@gerhardheinzel

Sorry, I don't see any kind of significant delay - how much are you seeing? How do you know that it is taking a long time from calling the DSPI_MasterTransferEDMA method?

In the code, I am calling DSPI_MasterTransferEDMA from within the CallBAck every 510 bytes and I don't see any kind of noticeable delay between the blocks.

I don't think this is something you can run in a debugger and see SPI data transferring every instruction cycle, you have to run it at full speed and when I was looking at the data, I used an oscilloscope.

myke

gerhardheinzel · ‎03-12-2021

I observe the delay by setting a GPIO pin immediately before calling DSPI_MasterTransferEDMA() and observing on an oscilloscope that GPIO and the SPI lines. SPI becomes active 11 µs after the GPIO trigger. At the moment the CPU is doing nothing else (but eventually it will have to do many other things...)

myke_predko · ‎03-12-2021

Hey @gerhardheinzel

Sorry, I reacted to the thread's title and it didn't quite click with me with the numbers you're showing.

Being honest with you, I would not consider 11us from an input signal to the start of the SPI data transfer to be "huge". If you're running at 120MHz, that's approximately 1,090 clock cycles - which, using my rule of thumb that the average line of C takes 12 clock cycles to execute, then that's less than a hundred lines of code but that doesn't take into account for the time it takes for the hardware operations.

I presume you're running code something like:

for (; LOGIC0 != GPIO_PinRead(SPISTART_GPIO, SPISTART_PIN;) {  }

DSPI_MasterTransferEDMA(OLED_DSPI_MASTER_BASEADDR
                      , &SPIEDMAMasterHandle
                      , &SPIXfer);

This probably looks like it would be just about instanteous but geting pin data is actually quite a lengthy process as is starting the first SPI transfer - if you are using an interrupt handler for catching the pin change state the response will be quite a bit longer than the loop above (but that loop is not anywhere close to exiting instantaneously).

There may be some places where the code could be tightened up for your specific application but I would be very surprised if you could make a significant reduction in the time from when the start signal comes in and the MCU responds.

Now having said that, if you go thorugh and find things that aren't appropriate for your application then discover that they reduce the response time significantly, I'd really like to hear about it.

Sorry I can't give you better news.

myke

gerhardheinzel · ‎03-12-2021

I toggle the GPIO pin with

GPIO_SetPinsOutput(GPIOB, (((unsigned long) 1) << 11));
DSPI_MasterTransferEDMA(LCD_SPI, &spi_dma_handle, &Xfer_DMA);
GPIO_ClearPinsOutput(GPIOB, (((unsigned long) 1) << 11));

which shouldn't have any overhead, and observe the GPIO and SPI pins with a real oscilloscope, not a software debugging tool.

I have stepped through the code and indeed it does plenty of stuff. My point is that I do the same transfer over and over again, with only the contents of the data changed but nothing else, and I was wondering if there is some way to "repeat the last transfer" without going through all the overhead again.

myke_predko · ‎03-12-2021

@gerhardheinzel

What kind of actual datarate are you seeing?

On a K22 running at 60MHz, 8k of data (a full display image) takes 11ms for me. This is an actual datarate of 5.96MHz which I consider reasonable because the processor is still running and bus accesses have to be arbitrated between the devices within the Kinetis.

myke