I am writing this post to document and report a serious issue with the KSDK non-DMA SPI master driver found in 2.3 and 2.4 (and possibly older releases in the 2.x series). I've found this issue on a FRDM-KL43 running proprietary code (I cannot share it) that utilizes the KSDK (formerly 2.3, updated to 2.4), and isolated it to the KSDK non-DMA SPI master driver.
On non-blocking master transfers on SPI1 (equipped with 8-byte FIFO) with data large enough to utilize the FIFO mechanism, a heavy system interrupt and critical section load causes the transfer to end without the user supplied callback being called.
SPI Master FIFO Interrupts:
The 2.3/2.4 SPI driver, when using the FIFO, only interrupts on the receive FIFO near full flag. It is configured for the flag to auto-clear when the condition is no longer true (the SPI peripheral offers a mode in which the flag is manually cleared). When this condition is hit, the receive FIFO is drained, and then the transmit FIFO is reloaded to the watermark level (or to full if there are 8 bytes remaining to transfer).
In a perfect world, the transmit FIFO will be loaded with 4 (or 6 depending on FIFO watermark settings) bytes. When transmission is complete, the receive FIFO near full interrupt will trigger and 4 (or 6) bytes will be drained from it. The transmit FIFO will then be reloaded, and this process occurs again.
In reality, if your last receive FIFO near full interrupt results in only 2 bytes remaining to be transmitted, the interrupt will not trigger again. However, despite this, the driver seems to work under normal circumstances.
I instrumented the SPI1 interrupt handler to toggle a GPIO so that I could observe when the interrupt is entered and exited. What I found is that under normal circumstances, the interrupt fires in pairs when using the FIFO. I believe this is because the receive FIFO near full flag is not under manual control, and so the SPI interrupt can become queued again while still executing (unintentionally). This results in the transfer complete callback sometimes being invoked twice on the same transfer.
When a heavy interrupt and critical section load is introduced, this paired interrupt behavior disappears and the SPI interrupts correspond with one per FIFO burst. In this case, the last interrupt (after the last dangling bytes have been transmitted) does not trigger (as expected given the circumstances).
The heavy load I have as an example is repeated USB transactions via virtual COM port. I am using FreeRTOS, which does have a critical section around the context switch on the Cortex-M0+ port, and so context switching is happening with the USB traffic. If I elevate the SPI interrupt above the USB interrupt, the same behavior is observed.
I have not been able to 100% isolate the mechanism by which the failure occurs, but part of this is because I have not been able to isolate the reason for the paired interrupts. Forcing the driver to not use the FIFO results in error-free operation, but the CPU utilization goes up and SPI throughput goes down.
Looking back in time to an older project that did not have this issue and using a much older KSDK version, the SPI operation is structured very differently.
Of note, the manual clearing of the FIFO interrupt flags is used rather than the automatic, and when there are dangling bytes at the end less than the size of the FIFO, the receive near full interrupt is disabled and the transmit FIFO empty interrupt is enabled. This causes the last interrupt to trigger on the empty transmit FIFO, which will always occur regardless of the number of bytes loaded.
I plan to make modifications to the KSDK 2.4 SPI driver to utilize the manual clearing flags and to change the interrupt enables on dangling bytes. Earlier, I attempted to utilize manual clearing flags and every FIFO-based transfer failed to trigger the callback, but that was before I discovered the issue with the paired interrupts and I now believe the issue with the callback was because the last interrupt was correctly not triggering.
I shared this information in the hopes that NXP can perform their own investigation and make corrections to the driver that resolve this issue. If someone from NXP would like to discuss this in more detail, you can reach me through the email address associated with my community account.
Thank you for taking the time to read this,