S32K144 run abnormally

frank747 · ‎02-22-2019

I have a project using S32K144 and SDK. Sometimes the program will run abnormally. The frequency of abnormal operation is not very high. Sometimes it may occur once in a few hours, sometimes it may occur once in dozens of hours. The phenomena are repeated interrupts in FTM timers (or other unknown interrupts), and the interval between interrupts in FTM timers is normal, but all the code in main function stops executing. It is not clear where the program is running and whether it has entered other unknown interrupts (my program does not open other unknown interrupts).

I would like to ask the following two questions:
1. Is it possible that I enabled the clock of unused peripherals to cause the exception? (There is a spare UART interface in my program. I set the pin configuration to enable the clock of the peripheral, but I did not initialize the peripheral.)
2. Is there any other reason that may lead to this anomaly?

In addition, what is the reason why the peripheral clock enabling settings of RTM2.0.0 clock components cannot be modified? Each time the S32DS software is opened, the software automatically modifies the component's peripheral clock to enable it to a specific state.

At present, I have used the old version of SDK to modify the peripheral clock enabling settings, disable the unused peripheral clock enabling, and are testing.

frank747 · ‎02-25-2019

Okay, I'll try. Thank you.

I've modified the program and I'm testing it. At present, 10 chips have been running for 24 hours without any breakdown. If there is another breakdown, I'll try it.

danielmartynek · ‎02-22-2019

Hi,

It is hard to say.
However, I would first check for fault exceptions.
https://community.nxp.com/docs/DOC-334902

If the LPUART module is disabled, it will not trigger interrupts.
In fact, the LPUART interrupts needs to be enabled separately.

Could you elaborate on the SDK clock problem?
You can’t save the component inspector configuration?

Thanks,
Daniel

frank747 · ‎02-22-2019

Hello, Daniel.
The old version of SDK clock configuration is not a problem for the time being, I will study this issue later.

Now I use the old version of SDK, which only enables the necessary peripheral clocks. Ten chips run for about 12 hours and then have the same failure once.
I can now confirm that the for loop in the main function is not executed after the fault occurs and can normally handle the interrupt service program of the FTM timer (because the timer interrupt service program has LED indicator operation). Because the for loop in main function is not executed, all tasks in my for loop (including CAN communication sending, GPIO control) cannot be executed.

Because I can't know which CPU will fail in advance, I can't know by simulation where the CPU is stuck after the failure.

And I have confirmed through some experiments that the location of the for cycle card at the failure time is random, not that there is a dead cycle in my for cycle.

I think it must be one of the following two reasons.
1. Repeatedly entering and exiting an unknown interruption. At the same time, FTM timer interrupt can be executed normally, because this unknown interrupt always exists, for loop can not be executed.
2. The priority of FTM timer interruption is higher than that of unknown interruption when there is a dead cycle in an unknown interruption, so it can also enter the timer interruption. After exiting the timer interruption, it is still in the dead cycle, so the for loop cannot be executed.

The Chinese translation is as follows. If a friend who understands Chinese sees it, please help me to analyze it. Thank you.

你好，丹尼尔。
旧版本的sdk时钟配置暂时不是问题，稍后我将研究这个问题。

现在我使用旧版本的sdk，它只启用必要的外围时钟。10个芯片运行大约12小时，然后出现同样的故障一次。
现在我可以确认主功能中的for循环在故障发生后没有执行，并且可以正常处理ftm计时器的中断服务程序（因为计时器中断服务程序具有LED指示灯操作）。由于未执行主函数中的for循环，因此无法执行my for循环中的所有任务（包括CAN通信发送、GPIO控制）。
因为我不知道哪一个CPU会提前失效，所以我无法通过模拟知道失效后CPU会卡在哪里。
通过一些实验，我确认了循环卡在失效时间的位置是随机的，而不是在我的循环中有一个死循环。
我认为这一定是以下两个原因之一。
1。反复进入和退出未知中断。同时，FTM定时器中断可以正常执行，因为这个未知的中断总是存在的并且总是被执行，所以for循环不能执行。
2。当未知中断中存在死循环时，FTM计时器中断的优先级高于未知中断，因此它也可以进入计时器中断。退出计时器中断后，它仍处于死循环中，因此无法执行for循环。

danielmartynek · ‎02-25-2019

Hello,

Can you just attach the debugger to the stuck MCU without reset, stop the execution, examine registers, step the code?

BR, Daniel

frank747 · ‎02-27-2019

Modified to check whether the mailbox receives the CAN message in the callback () function and process the CAN message, 10 MCUs ran for 10 hours without failures.

I checked my previous code (checking whether the mailbox received messages and handled them in the for loop), and I couldn't find the reason why the receiving mailbox status changed to FLEXCAN_MB_IDLE, which made the SDK software unable to clear the interrupt flag.

Is there a BUG in SDK? Or is my place handled incorrectly?

I sincerely hope for your reply.

Tomorrow I will continue testing the modified code.

Thank you.

frank747 · ‎02-26-2019

Now, I've changed the code to check which mailbox has completed receiving in the callback () function of Cancommunication interrupt execution, and processed the data and called the FLEXCAN_DRV_Receive () function to open the next receiving interrupt. I've started testing with 10 MCUs.

I wonder if this processing mechanism can solve the problem that the condition of if (FLEXCAN_DRV_GetTransferStatus (INST_CANCOM1,8)!= STATUS_BUSY) is not satisfied.

frank747 · ‎02-26-2019

Hello, Daniel.
I used what you said to connect a MCU that was running abnormally. I found that the MCU had been repeatedly entering and exiting the Can communication interruption, and the mailbox number of interruption was 8, but the condition of if (state - > MBS [mb_idx]. state == FLEXCAN_MB_RX_BUSY) in interruption processing was not satisfied, so the interruption flag of mailbox 8 was not be cleared, and the actual value of state - > MBS [mb_idx]. e was FLEXCAN_MB_IDLE. I am trying to check the cause of this abnormality, and I will broadly introduce it. Introduce the structure of my program. Please see if there is any problem. Thank you.

The data processing of mailbox 8 is performed in the for loop of main() function, and the processing flow is roughly as follows:

The following code is executed about 2.5ms once

If (FLEXCAN_DRV_GetTransferStatus (INST_CANCOM1,8)!= STATUS_BUSY)
{
Data processing of mailbox 8;
Send data using mailbox No. 1;
Send data using mailbox 2.
FLEXCAN_DRV_Receive (INST_CANCOM1, RDI_RXMB8_INDEX, & recvBuff [RDI_RXMB8_BUFINDEX])
}

S32K144 run abnormally

S32K144 run abnormally

General