Hi everyone,
I am encountering an intermittent LIN communication issue during prolonged operation and would appreciate your expertise. The system is configured with NXP S32K3xx (EB Tresos 28.2, SW32K3_STD_4.4_2.0.2RD2211) as a LIN master, sending PIDs 0x11 (slave response), 0x14 (slave response), and 0x10 (host request) every 10ms to a motor control board slave.
Initial communication operates normally for the first 2 minutes, but after approximately 10 minutes, intermittent timeout errors ("LIN_ID/frame error") occur, accompanied by incomplete frame transmission (only Break Field and Sync Byte observed on the oscilloscope, no PID/data).
Debugging revealed that the Lpuart_Lin_ip_FrameIrqHandler interrupt ceases to trigger during failures, leaving the global state structure Lpuart_Lin_ip_apxStateStructureArray oscillating between LIN_TX_BUSY and LINOPTIONAL, indicating a potential TX state machine lock.
Notably, the issue arises only after extended runtime, suggesting possible temperature sensitivity, resource conflicts, or peripheral configuration drift. Could this be caused by IRQ flag misconfiguration (e.g., accidental TX Complete IRQ disable), clock instability, or a known silicon errata? Guidance on diagnosing LPUART status registers (LPUART_STAT), IRQ enable bits, or clock integrity checks would be invaluable. Please advise on further steps or required data for analysis.
Std_ReturnType LinMstr_DataChk(uint8 current_frame_index)
{
Std_ReturnType ret_val = E_NOT_OK;
static uint8 linSdu[8] = {0};
static uint8 *linSduPtr = linSdu;
Lin_PduType *current_frame = &Lin_Schedule_Frames[current_frame_index];
lin_data.rx_status = Lin_GetStatus(LIN_CHANNEL_0, &linSduPtr);
do
{
if (LIN_OPERATIONAL == lin_data.rx_status)
{
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
lin_state = LIN_STATE_TX_READY;
break;
}
else
{
/* When send a wakeup signal to LIN BUS, the init state switch to LIN_OPERATIONAL */
}
if (current_frame->Drc == LIN_FRAMERESPONSE_TX)
{
/**
* State Machine Transitions:
* 1. On entering `LIN_STATE_TX_READY`, the Master initiates frame transmission.
* 2. If a Slave response is validated (`LIN_RX_OK`), transition to `LIN_STATE_RX_COMPLETED`.
* 3. Automatically advance to the next frame in the schedule table.
*/
switch (lin_state)
{
case LIN_STATE_TX_READY:
{
if (LIN_TX_OK == lin_data.rx_status)
{
MotMgr_SetMasterE2ECounter();
lin_state = LIN_STATE_TX_COMPLETED;
return E_OK;
}
else
{
lin_state = LIN_STATE_TIMEOUT_ERROR;
}
break;
}
case LIN_STATE_RX_COMPLETED:
{
/* After send the last frame sucessfully, the next frame shall be send immediately */
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
lin_state = LIN_STATE_TX_READY;
break;
}
/* This status depends on the order of the schedule. Now, it won't enter this branch */
case LIN_STATE_TX_COMPLETED:
{
/* Only Using in the last frame DRC is TX */
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
lin_state = LIN_STATE_TX_READY;
break;
}
case LIN_STATE_TIMEOUT_ERROR:
{
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
break;
}
default:
{
/* If LIN_FRAME_ERROR, the lin_state is LIN_IDLE, enter this branch to send Frame again.*/
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
lin_state = LIN_STATE_TX_READY;
break;
}
}
}
else if (current_frame->Drc == LIN_FRAMERESPONSE_RX)
{
switch (lin_state)
{
case LIN_STATE_TX_COMPLETED:
{
/**
* [Action] Send slave frame header and transition to waiting state(Waiting response from Slave).
* - Transmits the header of the RX frame to initiate Slave response.
* - State updated to LIN_STATE_RX_WAITING_RESP to monitor response.
*/
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
lin_state = LIN_STATE_RX_WAITING_RESP;
break;
}
case LIN_STATE_RX_WAITING_RESP:
{
/**
* [Polling] Check Slave response status.
* - If LIN_RX_OK: Valid response received, transition to completed state.
* - Else: Handle timeout or errors (BUSY/NO_RESPONSE).
*/
if (LIN_RX_OK == lin_data.rx_status)
{
/* Reveive Data from Buffer */
if (current_frame->Pid == 0x14U)
{
linmstr_debounce.timeout_cnt_14 = LINMSTR_TIME_BASE;
if (linmstr_rte_out.tmout_flag_14 == TRUE)
{
linmstr_debounce.recovery_cnt_14 += LINMSTR_TIME_BASE;
if (linmstr_debounce.recovery_cnt_14 >= LINMSTR_RECOVERY_DURATION)
{
linmstr_rte_out.tmout_flag_14 = FALSE;
}
}
for (uint8 index = 0U; index < 8U; index++)
{
lin_data.response_buffer[LINMSTR_PID_14][index] = linSduPtr[index];
}
}
/* Put the signals of the same message into the same buffer */
else if (current_frame->Pid == 0x11U)
{
if (linmstr_rte_out.tmout_flag_11 == TRUE)
{
linmstr_debounce.recovery_cnt_11 += LINMSTR_TIME_BASE;
if (linmstr_debounce.recovery_cnt_11 >= LINMSTR_RECOVERY_DURATION)
{
linmstr_rte_out.tmout_flag_11 = FALSE;
}
}
linmstr_debounce.timeout_cnt_11 = LINMSTR_TIME_BASE;
MotMgr_SetSlaveE2ECounter();
for (uint8 index = 0U; index < 8U; index++)
{
lin_data.response_buffer[LINMSTR_PID_11][index] = linSduPtr[index];
}
}
else
{
/* fall-through */
}
/* Reveive Data from Buffer */
lin_state = LIN_STATE_RX_COMPLETED;
ret_val = E_OK;
}
else
{
/**
* [Error Handling] Possible states:
* - LIN_TX_BUSY: Ongoing transmission blocking new operations
* - LIN_RX_NO_RESPONSE: Slave did not respond within timeout
* - LIN_RX_BUSY: Receiving data in progress.
* - In this state, send a frame to polling the status of Slave.
*/
/* The Logic implement in LinIf.c */
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
lin_state = LIN_STATE_TIMEOUT_ERROR;
}
/* If enter DIAG schedule table, the state may be stay the LIN_STATE_RX_WAITING_RESP, so we shall consider the condition that
the last state is Rx and Lin state is LIN_STATE_RX_WAITING_RESP, to avoid break continuous sending of schedule table */
break;
}
case LIN_STATE_RX_COMPLETED:
{
/**
* [Re-Initiate] Start next RX frame transaction.
* - Previous state validation: Requires LIN_RX_OK as precondition.
* - Sends header and transitions to LIN_STATE_RX_WAITING_RESP.
* - Timeout period defined by LIN specification or application config.
*/
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
lin_state = LIN_STATE_RX_WAITING_RESP;
break;
}
case LIN_STATE_TIMEOUT_ERROR:
{
if (LIN_RX_OK == lin_data.rx_status)
{
lin_state = LIN_STATE_RX_COMPLETED;
}
else
{
/* Detect the Time out error */
if (0x11U == current_frame->Pid)
{
linmstr_debounce.recovery_cnt_11 = LINMSTR_TIME_BASE;
if (linmstr_rte_out.tmout_flag_11 == FALSE)
{
linmstr_debounce.timeout_cnt_11 += LINMSTR_TIME_BASE;
if (linmstr_debounce.timeout_cnt_11 >= LINMSTR_DURATION_11)
{
linmstr_rte_out.tmout_flag_11 = TRUE;
}
}
}
if (0x14U == current_frame->Pid)
{
linmstr_debounce.recovery_cnt_14 = LINMSTR_TIME_BASE;
if (linmstr_rte_out.tmout_flag_14 == FALSE)
{
linmstr_debounce.timeout_cnt_14 += LINMSTR_TIME_BASE;
if (linmstr_debounce.timeout_cnt_14 >= LINMSTR_DURATION_14)
{
linmstr_rte_out.tmout_flag_14 = TRUE;
}
}
}
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
}
break;
}
default:
Lin_SendFrame(LIN_CHANNEL_0, current_frame);
lin_state = LIN_STATE_TIMEOUT_ERROR;
break;
}
}
} while (0U);
return ret_val;
}Best regards,
Dongxun
Hi @dongxun,
Thanks for the detailed analysis.
Based on the behavior observed, I don't believe this stems from a software driver bug or a hardware fault in the UART module. Instead, it aligns with typical embedded system behavior under interrupt-heavy conditions.
The root cause appears to be ISR preemption, where the LIN RX/TX interrupt was delayed due to higher-priority interrupts, leading to a buffer overrun (OR bit set in the STAT register).
The issue was effectively resolved by raising the priority of the LPUARTLIN-RXTx_IRQ, which prevented further preemption and restored stable communication.
This kind of mitigation is a good example of how interrupt prioritization can impact real-time communication reliability.
Best regards,
Daniel
Hi, Dan
Thank you for your detailed recommendations regarding the LPUART interrupt prioritization and debugging methodology. Moving forward, we will actively monitor the fix overviews in subsequent RTD releases to align with ongoing optimizations.
Hi @dongxun,
You're currently using an outdated version of the RTD. Please refer to the release notes for each version, which include detailed lists of Known Issues and Changes..
Consider the following:
Regards,
Daniel
Dear Team,
I am writing to report the root cause and resolution of a recurring LIN communication failure observed during data transmission. After thorough investigation, the issue was traced to the LPUART peripheral's status register (STAT). Specifically, the Overrun Error (OR) bit was consistently set in cases where the Protocol Identifier (PID) failed to transmit. This flag indicates that newly received data arrived before the previous data could be processed by the interrupt service routine (ISR), resulting in data loss and communication halts.
To resolve this:
While the current solution is effective, I welcome suggestions for further optimizations, such as:
Please share your insights on enhancing this approach.
Best regards,
dongxun,
Hello, as shown in the picture, I think I have also encountered this problem. Could you please tell me how you solved it?
Hello, I encountered a problem with LIN communication when using the MCU as the slave device. The phenomenon is as follows: After the program was burned, the LIN communication was normal. I used the MCU as the slave device to respond to the frame header from the host. However, during the operation process, after running for a certain period of time, an unresponsive situation would occur. The observed phenomenon was that the host was continuously sending unresponsive frame headers. After seeing this post, I attempted to modify the interrupt priority of LIN, but the phenomenon still persisted. The phenomenon I observed through the oscilloscope was that when there was no response at the frame header, the synchronization interval segment, synchronization segment and PID in the frame header were all present, but the response data part was absent. Could you please tell me where I should look for the cause? I'm looking forward to your reply. Thank you.
Hi, Aoyng,
I noticed that the unresponsiveness only occurs after a certain period of operation. This suggests that the LIN state machine was initially functioning correctly when the MCU acted as a slave. Have you observed exactly where the internal state machine gets stuck when the MCU stops responding?
Furthermore, have you consulted the reference manual and checked the slave-related error registers to determine if a specific error has triggered? For the S32K312 MCU, the Lpuart_Lin_Ip_StatusFlagType structure contains descriptions for various fault conditions. I’ve personally encountered an issue (while using the MCU as a master) where the LPUART_LIN_IP_RX_OVERRUN flag was set, preventing the MCU from transmitting the Sync segment and subsequent data. You should be able to identify where the slave is failing by inspecting these registers and the state machine status. I believe this should be a straightforward troubleshooting step.
Best regards,
dongxun
Hi Aoyng,
According to your screenshot, an Overrun Error clearly occurred during MCU data reception, leading to the communication deadlock. You can use the following method to manually clear the error flag and restore communication. As for the specific details of this error, please refer to the datasheet and other reference materials.
if(TRUE == Lpuart_Lin_Ip_HwGetStatusFlag(Base, LPUART_LIN_IP_RX_OVERRUN)) { /* Clear RxOverrun status */ (void)Lpuart_Lin_Ip_HwClearStatusFlag(Base, LPUART_LIN_IP_RX_OVERRUN); }
Furthermore, when I was looking for a way to clear the overrun, I found that there was a function specifically designed to perform this clearing operation. And this function is executed when an overrun occurs and is detected within the interrupt. What is the reason why my side was not cleared? Looking forward to your response. Thank you.
OK, I see that this status is being detected during the interruption. May I perform the aforementioned clearing operation in my own callback function, as shown in the figure below? The communication (both sending and receiving) of my slave program with the host is all triggered within the callback function.
I currently plan to incorporate this clearing operation into the interrupt service function for execution. Is this feasible? Or, could you please tell me which scheduling you are referring to? It would be better if there were some code for us to review. I would be very grateful if you could provide it.
Hi, Aoyng,
It is recommended to include this clearing operation in other scheduling programs, or if you can ensure that the callback function can come in during RX_overrun, you can put it here.
OK, thank you for your suggestion. I am using the S32K314 chip. According to the official documentation, I found that the size of the LPUART receive buffer for this chip is 4 bytes. May I understand that when the received data exceeds 4 bytes, an overflow interrupt will be triggered and it needs to be cleared at this time? Can I change the size of the receive buffer? The information I obtained from the above picture is that the default size of the receive buffer is currently 1 byte, but it was initially mentioned as 4 bytes. I am a bit confused about this. Do you have any understanding?
Sorry dongxun, perhaps the phenomenon I observed yesterday was incorrect. When I was debugging, if the monitoring program encountered a problem in which the program would enter an interrupt state, after receiving the frame header (the current MCU is the slave), it would again enter the interrupt and would enter this function (as shown in the following figure). It did not stop at the LPUART_LIN_IP_RX_OVERRUN state, but what I should do at this point is to receive the data. Then when it enters the interrupt again, it would stop at LPUART_LIN_IP_FRAME_ERR, and then clear the error. But as for why it shows 1 in the OR register, I'm not sure. Do you have any ideas?
Hi, Aoyng,
To the best of my knowledge, this is a bug. I found the following description in the errata manual of "SW32K3_S32M27x_RDD_R21-11_5.0.0_D2410_Release Notes. pdf". As I am using an older RTD version, it is evident that this bug exists in my 2.00 version.
Hi,
I placed it in a periodic function of the OS for calling (handwritten code), specifically in the function that calls Lin_SendFrame. It is called every time before sending LIn data You can try and test it to see if there are still any issues.