LIN：S32K312 MCU as a Master, Lin Timeout Error whiling waiting for LinID

dongxun · ‎07-25-2025

Hi everyone,

I am encountering an intermittent LIN communication issue during prolonged operation and would appreciate your expertise. The system is configured with NXP S32K3xx (EB Tresos 28.2, SW32K3_STD_4.4_2.0.2RD2211) as a LIN master, sending PIDs 0x11 (slave response), 0x14 (slave response), and 0x10 (host request) every 10ms to a motor control board slave.

Initial communication operates normally for the first 2 minutes, but after approximately 10 minutes, intermittent timeout errors ("LIN_ID/frame error") occur, accompanied by incomplete frame transmission (only Break Field and Sync Byte observed on the oscilloscope, no PID/data).

Debugging revealed that the Lpuart_Lin_ip_FrameIrqHandler interrupt ceases to trigger during failures, leaving the global state structure Lpuart_Lin_ip_apxStateStructureArray oscillating between LIN_TX_BUSY and LINOPTIONAL, indicating a potential TX state machine lock.

Notably, the issue arises only after extended runtime, suggesting possible temperature sensitivity, resource conflicts, or peripheral configuration drift. Could this be caused by IRQ flag misconfiguration (e.g., accidental TX Complete IRQ disable), clock instability, or a known silicon errata? Guidance on diagnosing LPUART status registers (LPUART_STAT), IRQ enable bits, or clock integrity checks would be invaluable. Please advise on further steps or required data for analysis.

Std_ReturnType LinMstr_DataChk(uint8 current_frame_index)
{
  Std_ReturnType ret_val = E_NOT_OK;
  static uint8 linSdu[8] = {0};
  static uint8 *linSduPtr = linSdu;

  Lin_PduType *current_frame = &Lin_Schedule_Frames[current_frame_index];
  lin_data.rx_status = Lin_GetStatus(LIN_CHANNEL_0, &linSduPtr);
  do
  {
    if (LIN_OPERATIONAL == lin_data.rx_status)
    {
      Lin_SendFrame(LIN_CHANNEL_0, current_frame);
      lin_state = LIN_STATE_TX_READY;
      break;
    }
    else
    {
      /* When send a wakeup signal to LIN BUS, the init state switch to LIN_OPERATIONAL */
    }

    if (current_frame->Drc == LIN_FRAMERESPONSE_TX)
    {
      /**
       * State Machine Transitions:
       * 1. On entering `LIN_STATE_TX_READY`, the Master initiates frame transmission.
       * 2. If a Slave response is validated (`LIN_RX_OK`), transition to `LIN_STATE_RX_COMPLETED`.
       * 3. Automatically advance to the next frame in the schedule table.
       */
      switch (lin_state)
      {
      case LIN_STATE_TX_READY:
      {
        if (LIN_TX_OK == lin_data.rx_status)
        {
          MotMgr_SetMasterE2ECounter();
          lin_state = LIN_STATE_TX_COMPLETED;
          return E_OK;
        }
        else
        {
          lin_state = LIN_STATE_TIMEOUT_ERROR;
        }
        break;
      }
      case LIN_STATE_RX_COMPLETED:
      {
        /* After send the last frame sucessfully, the next frame shall be send immediately */
        Lin_SendFrame(LIN_CHANNEL_0, current_frame);
        lin_state = LIN_STATE_TX_READY;
        break;
      }
      /* This status depends on the order of the schedule. Now, it won't enter this branch  */
      case LIN_STATE_TX_COMPLETED:
      {
        /* Only Using in the last frame DRC is TX */
        Lin_SendFrame(LIN_CHANNEL_0, current_frame);
        lin_state = LIN_STATE_TX_READY;
        break;
      }
      case LIN_STATE_TIMEOUT_ERROR:
      {
        Lin_SendFrame(LIN_CHANNEL_0, current_frame);
        break;
      }
      default:
      {
        /* If LIN_FRAME_ERROR, the lin_state is LIN_IDLE, enter this branch to send Frame again.*/
        Lin_SendFrame(LIN_CHANNEL_0, current_frame);
        lin_state = LIN_STATE_TX_READY;
        break;
      }
      }
    }
    else if (current_frame->Drc == LIN_FRAMERESPONSE_RX)
    {
      switch (lin_state)
      {
      case LIN_STATE_TX_COMPLETED:
      {
        /**
         * [Action] Send slave frame header and transition to waiting state(Waiting response from Slave).
         * - Transmits the header of the RX frame to initiate Slave response.
         * - State updated to LIN_STATE_RX_WAITING_RESP to monitor response.
         */
        Lin_SendFrame(LIN_CHANNEL_0, current_frame);
        lin_state = LIN_STATE_RX_WAITING_RESP;
        break;
      }
      case LIN_STATE_RX_WAITING_RESP:
      {
        /**
         * [Polling] Check Slave response status.
         * - If LIN_RX_OK: Valid response received, transition to completed state.
         * - Else: Handle timeout or errors (BUSY/NO_RESPONSE).
         */
        if (LIN_RX_OK == lin_data.rx_status)
        {
          /* Reveive Data from Buffer */
          if (current_frame->Pid == 0x14U)
          {
            linmstr_debounce.timeout_cnt_14 = LINMSTR_TIME_BASE;
            if (linmstr_rte_out.tmout_flag_14 == TRUE)
            {
              linmstr_debounce.recovery_cnt_14 += LINMSTR_TIME_BASE;

              if (linmstr_debounce.recovery_cnt_14 >= LINMSTR_RECOVERY_DURATION)
              {
                linmstr_rte_out.tmout_flag_14 = FALSE;
              }
            }

            for (uint8 index = 0U; index < 8U; index++)
            {
              lin_data.response_buffer[LINMSTR_PID_14][index] = linSduPtr[index];
            }
          }
          /* Put the signals of the same message into the same buffer */
          else if (current_frame->Pid == 0x11U)
          {
            if (linmstr_rte_out.tmout_flag_11 == TRUE)
            {
              linmstr_debounce.recovery_cnt_11 += LINMSTR_TIME_BASE;
              if (linmstr_debounce.recovery_cnt_11 >= LINMSTR_RECOVERY_DURATION)
              {
                linmstr_rte_out.tmout_flag_11 = FALSE;
              }
            }

            linmstr_debounce.timeout_cnt_11 = LINMSTR_TIME_BASE;
            MotMgr_SetSlaveE2ECounter();
            for (uint8 index = 0U; index < 8U; index++)
            {
              lin_data.response_buffer[LINMSTR_PID_11][index] = linSduPtr[index];
            }
          }
          else
          {
            /* fall-through */
          }
          /* Reveive Data from Buffer */
          lin_state = LIN_STATE_RX_COMPLETED;
          ret_val = E_OK;
        }
        else
        {
          /**
           * [Error Handling] Possible states:
           * - LIN_TX_BUSY: Ongoing transmission blocking new operations
           * - LIN_RX_NO_RESPONSE: Slave did not respond within timeout
           * - LIN_RX_BUSY: Receiving data in progress.
           * - In this state, send a frame to polling the status of Slave.
           */
          /* The Logic implement in LinIf.c */
          Lin_SendFrame(LIN_CHANNEL_0, current_frame);
          lin_state = LIN_STATE_TIMEOUT_ERROR;
        }
        /* If enter DIAG schedule table, the state may be stay the  LIN_STATE_RX_WAITING_RESP, so we shall consider the condition that
           the last state is Rx and Lin state is LIN_STATE_RX_WAITING_RESP, to avoid break continuous sending of schedule table */
        break;
      }
      case LIN_STATE_RX_COMPLETED:
      {
        /**
         * [Re-Initiate] Start next RX frame transaction.
         * - Previous state validation: Requires LIN_RX_OK as precondition.
         * - Sends header and transitions to LIN_STATE_RX_WAITING_RESP.
         * - Timeout period defined by LIN specification or application config.
         */
        Lin_SendFrame(LIN_CHANNEL_0, current_frame);
        lin_state = LIN_STATE_RX_WAITING_RESP;
        break;
      }
      case LIN_STATE_TIMEOUT_ERROR:
      {
        if (LIN_RX_OK == lin_data.rx_status)
        {
          lin_state = LIN_STATE_RX_COMPLETED;
        }
        else
        {
          /* Detect the Time out error */
          if (0x11U == current_frame->Pid)
          {
            linmstr_debounce.recovery_cnt_11 = LINMSTR_TIME_BASE;
            if (linmstr_rte_out.tmout_flag_11 == FALSE)
            {
              linmstr_debounce.timeout_cnt_11 += LINMSTR_TIME_BASE;
              if (linmstr_debounce.timeout_cnt_11 >= LINMSTR_DURATION_11)
              {
                linmstr_rte_out.tmout_flag_11 = TRUE;
              }
            }
          }
          if (0x14U == current_frame->Pid)
          {
            linmstr_debounce.recovery_cnt_14 = LINMSTR_TIME_BASE;
            if (linmstr_rte_out.tmout_flag_14 == FALSE)
            {
              linmstr_debounce.timeout_cnt_14 += LINMSTR_TIME_BASE;
              if (linmstr_debounce.timeout_cnt_14 >= LINMSTR_DURATION_14)
              {
                linmstr_rte_out.tmout_flag_14 = TRUE;
              }
            }
          }
          Lin_SendFrame(LIN_CHANNEL_0, current_frame);
        }
        break;
      }
      default:
        Lin_SendFrame(LIN_CHANNEL_0, current_frame);
        lin_state = LIN_STATE_TIMEOUT_ERROR;
        break;
      }
    }
  } while (0U);

  return ret_val;
}

Best regards,
Dongxun

danielmartynek · ‎07-28-2025

Hi @dongxun,

You're currently using an outdated version of the RTD. Please refer to the release notes for each version, which include detailed lists of Known Issues and Changes..

Consider the following:

Ensure the LPUART interrupt is not being masked or delayed by higher-priority ISRs. If possible, assign the highest priority to the LPUART interrupt to guarantee timely handling.
Implement debug logging around the LIN ISR and state transitions. Compare logs from successful and failed transmissions to identify anomalies or timing inconsistencies.
Capture the values of key LPUART registers (e.g., STAT, CTRL, BAUD) during failure conditions to detect stuck flags or misconfigurations.
Test with a reduced LIN schedule (e.g., only PID 0x10) to isolate timing-related issues and simplify debugging.
Stack Integrity Check
Investigate potential stack overflow issues: Monitor the stack pointer during runtime, Initialize SRAM with a known pattern at startup to detect overflows, consider increasing the stack size.
Read-After-Write Serialization
Apply read-after-write techniques to ensure register writes are properly completed and synchronized, especially in critical peripheral configurations.

Regards,

Daniel

在原帖中查看解决方案

danielmartynek · ‎07-28-2025

Hi @dongxun,

You're currently using an outdated version of the RTD. Please refer to the release notes for each version, which include detailed lists of Known Issues and Changes..

Consider the following:

Ensure the LPUART interrupt is not being masked or delayed by higher-priority ISRs. If possible, assign the highest priority to the LPUART interrupt to guarantee timely handling.
Implement debug logging around the LIN ISR and state transitions. Compare logs from successful and failed transmissions to identify anomalies or timing inconsistencies.
Capture the values of key LPUART registers (e.g., STAT, CTRL, BAUD) during failure conditions to detect stuck flags or misconfigurations.
Test with a reduced LIN schedule (e.g., only PID 0x10) to isolate timing-related issues and simplify debugging.
Stack Integrity Check
Investigate potential stack overflow issues: Monitor the stack pointer during runtime, Initialize SRAM with a known pattern at startup to detect overflows, consider increasing the stack size.
Read-After-Write Serialization
Apply read-after-write techniques to ensure register writes are properly completed and synchronized, especially in critical peripheral configurations.

Regards,

Daniel

dongxun · ‎07-29-2025

Hi, Dan

Thank you for your detailed recommendations regarding the LPUART interrupt prioritization and debugging methodology. Moving forward, we will actively monitor the fix overviews in subsequent RTD releases to align with ongoing optimizations.

dongxun · ‎07-26-2025

Dear Team,

I am writing to report the root cause and resolution of a recurring LIN communication failure observed during data transmission. After thorough investigation, the issue was traced to the LPUART peripheral's status register (STAT). Specifically, the Overrun Error (OR) bit was consistently set in cases where the Protocol Identifier (PID) failed to transmit. This flag indicates that newly received data arrived before the previous data could be processed by the interrupt service routine (ISR), resulting in data loss and communication halts.

To resolve this:

Immediate Mitigation: I cleared the OR bit in the STAT register, which restored normal LIN communication immediately.
Root Cause Analysis: I suspect that the LIN data processing ISR (LPUARTLIN-RXTx_IRQ) was interrupted by higher-priority interrupts entering critical sections. This caused delays in ISR execution, leading to buffer overrun conditions.
Corrective Action: The priority of LPUARTLIN-RXTx_IRQ was elevated to minimize preemption risks. Subsequent stress testing (continuous master-slave communication for >10 hours) confirmed stability under extended operation.

Request for Feedback

While the current solution is effective, I welcome suggestions for further optimizations, such as:

Implementing hardware flow control (if supported by the LPUART peripheral) to prevent overruns.
Adding buffer occupancy checks in the ISR to proactively clear data before overflow occurs.
Exploring DMA-based data transfer to reduce CPU intervention and interrupt latency

.

Please share your insights on enhancing this approach.

Best regards,
dongxun,

danielmartynek · ‎08-01-2025

Hi @dongxun,

Thanks for the detailed analysis.

Based on the behavior observed, I don't believe this stems from a software driver bug or a hardware fault in the UART module. Instead, it aligns with typical embedded system behavior under interrupt-heavy conditions.

The root cause appears to be ISR preemption, where the LIN RX/TX interrupt was delayed due to higher-priority interrupts, leading to a buffer overrun (OR bit set in the STAT register).
The issue was effectively resolved by raising the priority of the LPUARTLIN-RXTx_IRQ, which prevented further preemption and restored stable communication.

This kind of mitigation is a good example of how interrupt prioritization can impact real-time communication reliability.

Best regards,

Daniel