Hardfault for i.MX RT1050 running aws_shadow_enet demo

andrewmccartney · ‎08-03-2018

I've been trying out the RT1050 evk using AWS FreeRTOS and running the aws_shadow_enet demo application from the SDK. It connects properly to the Amazon Cloud with no issues, but will hardfault at random intervals from 30 minutes to over 24 hours. The hardfault is the same every time - CLOCK_ControlGate() at fsl_clock.h:979 0x0. This is while servicing an Ethernet IRQ. I have seen it happen during different tasks - IDLE, MQTT, Logging. Here is a snapshot of the backtrace (same every time).

Using the HEAP Usage plugin, I don't see any evidence that any of the tasks have overrun their task stack.

Based on the backtrace, I am suspicious that it is jumping back into main() and getting confused because the stack for main() is re-used by the interrupt handlers in FreeRTOS.

Has anyone else seen anything like this on the RT1050?

leloiviet95 · ‎11-03-2019

Hi,

I am also looking for an SDK with freeRTOS that uses MQTT. I think it is the same as you are doing.

I search SDK in https://mcuxpresso.nxp.com/en/builder -> KITS -> i.MX -> download all IMXRT1050 but not SDK support lwip_mqtt

Can you help me with the SDK or sample program you are doing on the RT1050?

andrewmccartney · ‎08-09-2018

Finally figured this one out .. kind of a long explanation.

The cause of the hard fault is a NULL function pointer for s_enetErrIsr in fsl_enet.c. This is due to the RT1050 having a single IRQ for all Ethernet Interrupts. The K64 for example has three separate Ethernet interrupts for Rx, Tx and Errors. Based on the value of enet_config_t->interrupt, each required interrupt handler is configured by setting up a pointer to a function and then the interrupt is enabled. For the K64 the Error IRQ is not configured and the IRQ is not enabled. This works as expected when there are separate IRQ's. In the case of the RT1050, there is only one IRQ used in all three handlers (Rx, TX, Error). So if the IRQ is enabled for one of them it is enabled for all three. In the case of the RT1050, the enet_config_t->interrupt value is configured so that the handler for Ethernet error interrupts is ignored and is left NULL. Because the IRQ is also used for Rx and Tx it is enabled. As soon as you have any type of Ethernet error then a NULL function pointer is called and you have a hardfault.

The fix is to make sure that all of the masks for Ethernet errors are set in enet_config_t->interrupt. Now the handler is setup with a valid function pointer so that when there is an Ethernet error interrupt it is handled by valid code.

I have to blame NXP for this one. They must not have tested what happens when there is an Ethernet error, so they never saw the hardfault.

Hardfault for i.MX RT1050 running aws_shadow_enet demo

Hardfault for i.MX RT1050 running aws_shadow_enet demo

i.MXRT 105x