timing_pal S32SDK_for_S32K1xx_RTM_3.0.3 deadlock bug

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
已解决

timing_pal S32SDK_for_S32K1xx_RTM_3.0.3 deadlock bug

跳至解决方案
629 次查看
jeremie_chirat
Contributor II

Hi,

After a painstaking analysis of random crashes (only 2-3 every 12 hours) we had on our S32K148-based board (we have an external watchdog so a deadlock causes a reboot of the board), we followed some advice from the NXP forums, and we managed to do a debug session by disabling the watchdog and did an "attach running" after a crash.

We confirmed the culprit: the board got stuck inside the FTMX_ChX_ChX_IrqHandler :

Example: FTM0_Ch0_Ch1_IrqHandler line 287:

 if (chan0IntFlag && g_ftmChannelRunning[0][0])
{
   TIMING_Ftm_IrqHandler(0U, 0U);
}

The check on the running channel never passed, so the handler was never executed correctly, and as a result the interrupt was never disabled and run continuously.

Our system use an RTOS (SafeRTOS based on FreeRTOS). After analysis we saw that in timing_pal.c : TIMING_StartChannel() line 769, the interrupt is enabled before the channel running is updated:

/* Enable the channel by enable interrupt generation */
retVal = FTM_DRV_EnableInterrupts(ftmInstance, (1UL << channel));
(void)retVal;
/* Update channel running status */
g_ftmChannelRunning[ftmInstance][channel] = true;

Since the timing requested is 500us (we needed a timer since the time is less than 1 tick (1ms)), if by some random chance the task gets interrupted between the interrupt enable and the channel status update and the system takes too long to return, then it results in a deadlock (what happened to us).

I think to solve this issue the order of the actions should be inverted (channel status update first then as the last action the interrupt enabling).

Since we did not want to modify the SDK the workaround we used is to start a critical section before the call to TIMING_StartChannel() and stop it just after, which we confirmed fixed the crashes after long run tests.

 

Can you confirm the bug ?

 

Best regards,

 

Jérémie Chirat

标记 (1)
0 项奖励
回复
1 解答
613 次查看
cuongnguyenphu
NXP Employee
NXP Employee

Hi @jeremie_chirat,
Thanks for your hard work. I confirm this is a potential bug in SDK, it's still in our latest version S32_SDK_S32K1xx_RTM_4.0.3
I raised this issue with your solution for our development team and asked for the workaround. It looks a great idea to update channel status first before the interrupt enabling

在原帖中查看解决方案

0 项奖励
回复
2 回复数
614 次查看
cuongnguyenphu
NXP Employee
NXP Employee

Hi @jeremie_chirat,
Thanks for your hard work. I confirm this is a potential bug in SDK, it's still in our latest version S32_SDK_S32K1xx_RTM_4.0.3
I raised this issue with your solution for our development team and asked for the workaround. It looks a great idea to update channel status first before the interrupt enabling

0 项奖励
回复
608 次查看
jeremie_chirat
Contributor II

Thank you for your answer, glad that what I did can be useful.

0 项奖励
回复