LPUART driver bug

cancel
Showing results for 
Search instead for 
Did you mean: 

LPUART driver bug

159 Views
Contributor II

There is a UART driver issue that is present in SDK_2.7.0. While I have not tested it, I believe it is still present in SDK_2.8. I am using a MKE14F512VLL16. In my case I am also using FreeRTOS and have one task handling sending data out the UART and another task reading data from the UART, concurrently. 

Things usually appear to work fine. However, we have witnessed a small number of mysterious device lockups and so I was tasked with looking into it. What I found was that in this bad state, the device would repeatedly receive a Tx interrupt. Inside of LPUART_TransferHandleIRQ() , the kLPUART_TxDataRegEmptyFlag would be set in STAT, and the kLPUART_TxDataRegEmptyInterruptEnable flag would be set in CTRL. However, the handle->txDataSize field would unexpectedly be zero. This caused an endless interrupt loop.

I believe the root cause of this problem is a small window of opportunity where a new Rx request would be made by the application (via LPUART_TransferReceiveNonBlocking()). Among other things, this request calls LPUART_DisableInterrupts() and LPUART_EnableInterrupts() to clear/set various interrupt flags. Normally this is fine. However, there is a problem in the following scenario:

1. IF within LPUART_EnableInterrupts (for example), the following line starts to be executed:
base->CTRL |= mask;

2. A Tx interrupt immediately occurs causing the last byte of a transfer to be written out. Note that in this case, the LPUART_TransferHandleIRQ() will do the following:
base->CTRL = (base->CTRL & ~LPUART_CTRL_TIE_MASK);
which (correctly) clears the Tx interrupt flag. 

But, because the operation in #1 is not atomic (*), if the interrupt occurred after the read instruction of the read/modify/write instruction sequence in #1, but before the write instruction, when the ISR returns and the application call continues within LPUART_EnableInterrupts(), it will write stale data back to CTRL, with the Tx interrupt flag still set. This causes the lockup. 

There might be a more elegant way to fix this, but one fix for this case is to disable the UART IRQs at the beginning of LPUART_DisableInterrupts() and LPUART_EnableInterrupts() and then reenable them before the function returns. For example:

void LPUART_EnableInterrupts(LPUART_Type *base, uint32_t mask)

{

   // First NEW CODE block start

    uint32_t instance = LPUART_GetInstance(base);

#if defined(FSL_FEATURE_LPUART_HAS_SEPARATE_RX_TX_IRQ) && FSL_FEATURE_LPUART_HAS_SEPARATE_RX_TX_IRQ

    DisableIRQ(s_lpuartTxIRQ[instance]);

    DisableIRQ(s_lpuartRxIRQ[instance]);

#else

    DisableIRQ(s_lpuartIRQ[instance]);

#endif

   // First NEW CODE block end

    base->BAUD |= ((mask << 8U) & (LPUART_BAUD_LBKDIE_MASK | LPUART_BAUD_RXEDGIE_MASK));

#if defined(FSL_FEATURE_LPUART_HAS_FIFO) && FSL_FEATURE_LPUART_HAS_FIFO

    base->FIFO = (base->FIFO & ~(LPUART_FIFO_TXOF_MASK | LPUART_FIFO_RXUF_MASK)) |

                 ((mask << 8U) & (LPUART_FIFO_TXOFE_MASK | LPUART_FIFO_RXUFE_MASK));

#endif

    mask &= 0xFFFFFF00U;

    base->CTRL |= mask;


// Second NEW CODE block start

#if defined(FSL_FEATURE_LPUART_HAS_SEPARATE_RX_TX_IRQ) && FSL_FEATURE_LPUART_HAS_SEPARATE_RX_TX_IRQ

    EnableIRQ(s_lpuartRxIRQ[instance]);

    EnableIRQ(s_lpuartTxIRQ[instance]);

#else

    EnableIRQ(s_lpuartIRQ[instance]);

#endif

   // Second NEW CODE block end

}

-------------------------------------------------------------------------------------

(*) The disassembly of base->CTRL |= mask shows the following (non-atomic):

00066a58:   ldr     r3, [r7, #4]

00066a5a:   ldr     r2, [r3, #24]                                                 // copies base->CTRL into temporary register

00066a5c:   ldr     r3, [r7, #0]                                                   //              WINDOW OF OPPORTUNITY exists here if interrupt

00066a5e:   orrs    r2, r3                                                          //              occurs and ISR modifies CTRL flag. If it does,

00066a60:   ldr     r3, [r7, #4]                                                   //              our CTRL copy will now be stale!

00066a62:   str     r2, [r3, #24]                                                 // writes potentially stale result of OR back to base->CTRL,

 666      }                                                                                  // potentially losing ISR change

 

Labels (1)
0 Kudos
4 Replies

55 Views
NXP TechSupport
NXP TechSupport

could  you please send me your simple project for me to reproduce?  

 

Regards

Daniel

0 Kudos

39 Views
Contributor II

My current project is too complex to split up easily, but I'll try to put something together for you that exhibits the issue and post it.

I could be wrong, but I think as long as you have two freeRTOS tasks, one reading a byte at a time from a UART (that has data being written to it) and the other writing to the same UART, repeatedly, the problem will eventually show up.

In my case, one of my tasks is repeatedly calling LPUART_RTOS_Receive() and the other is repeatedly calling LPUART_RTOS_Send().

Note that LPUART_RTOS_Receive() eventually calls LPUART_TransferReceiveNonBlocking() and waits for the transfer to finish. The latter calls both LPUART_DisableInterrupts and LPUART_EnableInterrupts while it is setting up the transfer. (I am using a 32 byte ring buffer).

LPUART_RTOS_Send() eventually calls LPUART_TransferSendNonBlocking() and waits for the transfer to finish.

The trick is that the start of a Rx transfer needs to be initiated immediately before a previously started Tx transfer finishes. The latter has to interrupt the Rx task and finish the transfer (i.e. modify base->CTRL) just as the former is in the middle of setting base->CTRL inside of LPUART_DisableInterrupts() or LPUART_EnableInterrupts(). This window is small, but it exists.

0 Kudos

83 Views
NXP TechSupport
NXP TechSupport

Hi Cavin:

Thank you very much for your sharing.  I will try to reproduce your issue on my side.

 

Regards

Daniel

0 Kudos

67 Views
Contributor II

Thank you, Daniel. For what it is worth, I was able to able to reproduce the issue somewhat reliably by simply increasing the frequency of the reads and writes to multiple transfers a second. Even so, this would produce a lockup only once every 3 to 4 hours or so on average.

In my case, the transmit transfers were generally short (generally less than 100 bytes each), and the read transfers were only a single byte at a time. (Many more read transfers than write transfers).

0 Kudos