I know this topic is quite old, however we were experiencing this same issue that would ultimately manifest itself as a complete SPI transfer lockup after approximately 60 days of continuous operation. I was able to resolve the issue and locate the root cause issue.
SHORT version:
The driver is not checking the state of the SPI module when writing to the CTAR, RSER and MCR registers. Specifically, the SPI module must be in the "Stopped" state before writing to them. This requirement was specifically called out in the NXP documentation.
I've attached an updated source file that fixes the issue.
FYI, I believe this issue is still present even in the latest KSDK..
LONG version:
Issue:
On a Kinetis K61 processor using MQX OS SPI drivers, SPI receive ISR events were occurring despite all indicators in the SPI controller module ( FIFO counters, interrupt enabled states, etc ) that an ISR should NOT be occurring. These unexpected ISR events are resulting in an erroneous posting of the lightweight semaphore ( lwsem , "EVENT_IO_FINISHED" ) used to coordinate between SPI transmission ( TX ) and SPI receive ( RX )events. The erroneous lwsem posting results in continuous
fall-through in _dspi_tx_rx() where the function waits for the lwsem after a TX. A posting of the lwsem should indicate that all RX events have completed.
Eventually the erroneous postings will accumulate and cause the lwsem count to rollover. On a lwsem post , the lwsem is incremented by one count under the MQX OS lwsem handler. The handler does not check for a rollover condition before incrementing the count.** The lwsem count value is a signed 32-bit container ( -2,147,483,648 to 2,147,483,647 ).
A rollover into a negative value will result in an indefinite lockup of the lwsem. Depending on the spi transfer intervals, this could take days or months before the lockup occurs.
NOTE: For testing purposes, temporarily changing lwsem VALUE type to int16_t will significantly reduce this interval.
** Although there is an argument to be made that a semaphore count value should not be allowed to rollover ( it depends on the design philosophy of the OS ), that is not the root cause of the issue.
Root Cause:
A review of the low level driver against NXP documentation indicates that portions of the driver code was not verifying that the SPI module was in "Stopped" state before writing to CTAR, RSER and MCR registers.
According to NXP K61 documentation (1) :
Section 53.3.1 (SPIx_MCR):
"Contains bits to configure various attributes associated with the module operations. The
HALT and MDIS bits can be changed at any time, but the effect takes place only on the
next frame boundary. Only the HALT and MDIS bits in the MCR can be changed, while
the module is in the Running state."
Section 53.3.3 (SPIx_CTARn):
"CTAR registers are used to define different transfer attributes. Do not write to the CTAR
registers while the module is in the Running state."
Section 53.3.6 (SPIx_RSER):
"RSER controls DMA and interrupt requests. Do not write to the RSER while the module
is in the Running state."
Discussion:
Section 53.4.1 "Start and Stop of module transfers" details the "Stopped" and "Running" states:
"The TXRXS bit in the SR indicates the state of module. The bit is set if the module is in
Running state.
The module starts or transitions to Running when all of the following conditions are true:
* SR[EOQF] bit is clear
* MCU is not in the Debug mode or the MCR[FRZ] bit is clear
* MCR[HALT] bit is clear
The module stops or transitions from Running to Stopped after the current frame when
any one of the following conditions exist:
* SR[EOQF] bit is set
* MCU in the Debug mode and the MCR[FRZ] bit is set
* MCR[HALT] bit is set
State transitions from Running to Stopped occur on the next frame boundary if a transfer
is in progress, or immediately if no transfers are in progress."
This SPI driver does not utilize / control the SR[EOQF] bit or associated RSER EOQF control
and we will not be employing the MCR[FRZ] bit under normal operation.
We can use the MCU[HALT] bit to control the Stopped / Running states.
Resolution:
Create code to halt (as needed ) SPI module, putting module into "Stopped" state
Create code to handle returning into the "Running" state if we stopped it
NOTE: Changes are based on MQX OS v4.1 BSP driver
References:
1. NXP K61P144M150SF3RM.pdf K61 Sub-Family Reference Manual, Rev. 3 , November 2014