I am developing my application on my custom board based on the MK66FN2M0VMD18 device.
The CPU works in HSRUN mode at 180MHz and the bus clock is set to 60MHz.
SPI2 peripheral is used in master mode with 20MHz SCK clock and SPI2_PCS0 for the slave chip-select.
My application is in charge to make lots of data transfers on SPI2 with a very large amount of data, so I decided to use SPI2 with eDMA to minimize the execution time of each data transfer.
I am using MCUXpresso SDK 2.6.0, but it looks like the DSPI_MasterTransferEDMA driver function is quite inefficient, because I get two issue.
I expected to see all the data byte sent one after the other with a minimum Inter-Byte Time, because eDMA is used to handle the data transfer with no action from the CPU.
But this is true only for the first two bytes and the last two bytes of the transfer, as you can see in the following scope screenshot for a 5 data bytes transfer:
Color | Signal |
---|---|
Yellow | SPI_CLK |
Cyan | SPI_MISO |
Purple Red | SPI_MOSI |
Blue | Debug GPIO output |
The following is the start of a 256 data bytes transfer:
And the following is the end of the same 256 data bytes transfer:
Why a 3-4us Inter-Byte Time is present? And why not at the start and at the end of the transfer?
From the call to the DSPI_MasterTransferEDMA driver function to the first byte transfer I measure a Transfer Setup Time of about 700us:
The core clock is set to 180MHz. Is it possible that it takes all that time to setup and start the transfer?
Both those Inter-Byte and Transfer Setup times severely limit my data transfer throughput.
Is there a way to improve those two times?
I realized my SPI+eDMA throughput issue was not due to the DSPI_MasterTransferEDMA driver function.
On my custom board an external 2Mx8 SRAM is present and connected to the Kinetis CPU 60MHz 8-bit FlexBus. My application allocates Heap and Stack sections on this external SRAM and the data buffer transfered with SPI is actually allocated in the Heap section. So, the eDMA works on data bytes through the 60MHz 8-bit FlexBus, reading/writing bytes according to the external SRAM access time. This SRAM access time limits both the Inter-Byte Time and the Transfer Setup Time of the SPI data transfer.
Also, my FlexBus wait states configuration was not actually optimized for the external SRAM access time. After adjusting the wait states number to the minimum, I got a 40% improvement on my SPI data transfer time. The best would be working with data allocated in the internal SRAM.