FreeRTOS Task Overflow Detection

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

FreeRTOS Task Overflow Detection

Jump to solution
9,015 Views
myke_predko
Senior Contributor III

A few weeks ago a discussion on stack size came up (FRDM-K22F monitoring maximal stack/heap usage of program) with some pointers as to how to see how much stack is being used at a given time but there really weren't good instructions in understanding how to calculate the maximum memory required by a FreeRTOS task stack to cover all contingencies.  

The Task Award Debugging (TAD) "Task List" is useful as are the other tools that @ErichStyger has pointed out in his excellent MCU On Eclipse series but they give an idea of what's happening in regards to the task's stack, but do not give you a hard and fast indication if you've blown the stack.  

This has been an issue for me because in my current application, I could see some strangeness but I couldn't quantify it.  Sometimes tasks would stop responding and/or the aptly named task "ied" would appear in the Task List.  Unfortunately, this would only appear after 9+ hours of very rigorous testing.  

I suspected that the problem was a task with a blown stack as there could be a case where multiple interrupt requests could pile up during a Flash erase/write cycle in which interrupts are disabled.  This results in multiple simultaneous interrupt requests being serviced simultaneously and the context information for each instance being placed on a task's stack.  

After doing some research, I found that the task stacks can be checked to see if they have gone beyond their defined limits by setting the "configCHECK_FOR_STACK_OVERFLOW" macro in "FreeRTOSConfig.h" to "2".  "1" can also be used as a value for this test, but "2" provides a more complete test (the stack area is initialized to 0xa5a5a5a5) and if a new maximum value is detected, the next four words are checked for 0xa5a5a5a5 and if there is a miscompare, it is assumed that the stack pointer has gone beyond its defined limits and calls the "vApplicationStackOverflowHook" method.  

"vApplicationStackOverflowHook" is a user provided method which handles the overflow event.  I've created a simple one in the "overflowCheck.c" file in which interrupts are disabled, indicator LED(s) are set to a specific pattern and the code enters a hard loop.  

/*
* overflowCheck.c
*
* Created on: Apr. 13, 2021
* Author: myke
*/
#include <stdio.h>
#include "board.h"
#include "peripherals.h"
#include "pin_mux.h"
#include "clock_config.h"
#include "MK22F51212.h"
#if (2 != SDK_DEBUGCONSOLE)
#include "fsl_debug_console.h"
#endif
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"


void vApplicationStackOverflowHook(TaskHandle_t* pxTask
, char* pcTaskName ) {

__disable_irq();

GPIO_PinWrite(BOARD_OVERFLOWLED_GPIO
, BOARD_OVERFLOWLED_PIN
, LOGIC1);

for(;;) { }
}

Note that the LED operation uses the "GPIO_PinWrite" inline method which should not access the stack (as the stack is unreliable at this point).  I'm making this point becasue the "vApplicationStackOverflowHook" method is only called when the stack is blown - execution must not continue past this point as you are dealing with unknown application and system variable values. 

Tags (1)
1 Solution
8,958 Views
myke_predko
Senior Contributor III

@ErichStyger 

Questions about your sample overflow hook code.  

What version of FreeRTOS are you running and how have you set up your build?  

I'm asking because:

  • the method name has "McuRTOS_" preceeding it
  • you don't seem to have load in any includes for GCC or FreeRTOS (including the function prototype which is defined in the RTOS files)
  • You are looking for the defined macro "McuLib_Config_CPU_IS_ARM_CORTEX_M" rather than "__CORTEX_M"

I don't believe your code will build in a standard FreeRTOS/MCUXpresso system.  

I've added the breakpoint to my code and the result looks like:

/*
* overflowCheck.c
*
* Created on: Apr. 13, 2021
* Author: myke
*/
#include <stdio.h>
#include "board.h"
#include "peripherals.h"
#include "pin_mux.h"
#include "clock_config.h"
#include "MK22F51212.h"
#if (2 != SDK_DEBUGCONSOLE)
#include "fsl_debug_console.h"
#endif
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "Task_priorities.h"
#include "usefulValues.h"

void vApplicationStackOverflowHook(TaskHandle_t* pxTask
, char* pcTaskName ) {

__disable_irq();

//todo: Indicate Overflow with LEDs

#if defined __CORTEX_M && (__CORTEX_M == 4U)
__asm volatile("bkpt #0");
#endif

for(;;) { }

}

To invoke the overflow check, you must set the "configCHECK_FOR_STACK_OVERFLOW" macro in "FreeRTOSConfig.h" to 2:

#define configCHECK_FOR_STACK_OVERFLOW 2

[EDIT AFTER @ErichStyger REPLY]

I've marked this as the "Solution" becuase it is appropriate for the version of FreeRTOS available for Kinetis from NXP.  

View solution in original post

0 Kudos
6 Replies
8,974 Views
ErichStyger
Senior Contributor V

Hi @myke_predko ,

I agree on what you say about buffers and structs, but that was the case with two different communication stacks which allocated rather large structures on the stack (not to use global memory) but did not use all the memory for the objects (more precise: they were using using unions and variable buffer length).

It is not about relying on watchdogs: it is just yet another piece for making applications more robust, and watchdogs (both internal, better external) can do this together with other quality measures. And yes: I do use them in safety critical systems but as well in everything which 'needs to work': so in every real application, especially if they are remote (e.g. satellite systems).

Erich

8,990 Views
ErichStyger
Senior Contributor V

Hi @myke_predko ,

here is the stack overflow hook I'm using with FreeRTOS most of the time:

void McuRTOS_vApplicationStackOverflowHook(TaskHandle_t pxTask, char *pcTaskName)
{
  /* This will get called if a stack overflow is detected during the context
     switch.  Set configCHECK_FOR_STACK_OVERFLOWS to 2 to also check for stack
     problems within nested interrupts, but only do this for debug purposes as
     it will increase the context switch time. */
  (void)pxTask;
  (void)pcTaskName;
  taskDISABLE_INTERRUPTS();
  /* Write your code here ... */
#if McuLib_CONFIG_CPU_IS_ARM_CORTEX_M
    __asm volatile("bkpt #0");
#elif McuLib_CONFIG_CPU_IS_RISC_V
    __asm volatile( "ebreak" );
#endif
  for(;;) {}
}

Interrupts get disabled to prevent further task context switches. With the break instruction I stop immediately and call the debugger if I'm doing a debug session, so I see it right away in the debugger. If there is no debugger attached, it will cause a reset of the system which is beside of doing this with the watchdog something would do anyway.

Blinking an SOS code and then halting the system is something I do in other cases too, and then wait for the watchdog to kick in.

I hope this helps,

Erich

8,959 Views
myke_predko
Senior Contributor III

@ErichStyger 

Questions about your sample overflow hook code.  

What version of FreeRTOS are you running and how have you set up your build?  

I'm asking because:

  • the method name has "McuRTOS_" preceeding it
  • you don't seem to have load in any includes for GCC or FreeRTOS (including the function prototype which is defined in the RTOS files)
  • You are looking for the defined macro "McuLib_Config_CPU_IS_ARM_CORTEX_M" rather than "__CORTEX_M"

I don't believe your code will build in a standard FreeRTOS/MCUXpresso system.  

I've added the breakpoint to my code and the result looks like:

/*
* overflowCheck.c
*
* Created on: Apr. 13, 2021
* Author: myke
*/
#include <stdio.h>
#include "board.h"
#include "peripherals.h"
#include "pin_mux.h"
#include "clock_config.h"
#include "MK22F51212.h"
#if (2 != SDK_DEBUGCONSOLE)
#include "fsl_debug_console.h"
#endif
#include "FreeRTOS.h"
#include "task.h"
#include "queue.h"
#include "Task_priorities.h"
#include "usefulValues.h"

void vApplicationStackOverflowHook(TaskHandle_t* pxTask
, char* pcTaskName ) {

__disable_irq();

//todo: Indicate Overflow with LEDs

#if defined __CORTEX_M && (__CORTEX_M == 4U)
__asm volatile("bkpt #0");
#endif

for(;;) { }

}

To invoke the overflow check, you must set the "configCHECK_FOR_STACK_OVERFLOW" macro in "FreeRTOSConfig.h" to 2:

#define configCHECK_FOR_STACK_OVERFLOW 2

[EDIT AFTER @ErichStyger REPLY]

I've marked this as the "Solution" becuase it is appropriate for the version of FreeRTOS available for Kinetis from NXP.  

0 Kudos
8,955 Views
ErichStyger
Senior Contributor V

Yes, that piece is based on the McuLib FreeRTOS port in https://github.com/ErichStyger/McuOnEclipseLibrary/tree/master/lib/FreeRTOS

The difference is that it can use CMSIS-Core but does not have to (e.g. runs with non-ARM cores too). It uses internally that __CORTEX_M macro too, so you can use it directly as well as in your code.

 

Erich

8,976 Views
myke_predko
Senior Contributor III

Hi @ErichStyger 

A couple of comments back regarding your replies about overflow detection.  

I'm trying to think of a case where I would push an unitialized structure/array onto the stack and I'm coming up with a blank.  I think a big part of this that over the decades I've been programming, I've developed coding guidelines that ideally eliminate the chance for undetected programming errors (ie never using code that builds with warnings as we discussed before, using Yoda style conditional statements, etc.) to get through and cause a problem later. 

A big one is to never have unitialized objects and variables.  The reason for that is different systems/OSes/compilers may or may not initialize to some value that doesn't cause a problem in the initial case but explodes down the line and is often very difficult to debug.  

Another one is passing objects to methods - pointers should always be passed if the data object is larger than a word.  I know there's no hard programming philosophical reason for this, but I like to have only one version of data so that if there is an incorrect value it's only in one place and available to all the methods accessing it which makes it easier to catch the problem and identify where the error takes place.  Along with that, I don't like the code and space overhead needed for handling multiple copies of the same data - especially true in MCUs which have fairly limited (Flash and SRAM) memory and often every instruction cycle counts.  

I know a lot of current generation programmers are reluctant to use pointers as they're perceived as being difficult to understand/work with and many modern languages (and teaching programs) eschew their use but I don't think you're a competent programmer unless you are comfortable with pointers.  </OldGuyRant>

I was surprised by your emphasis on the watchdog.  I've always been reluctant to rely on them as they tend to mask problems once the product is out in the field.  I had a case where I was thrown in to debug a system that never should have been released, but relied on the watchdog timer to reboot when the software crashed in order to give the appearance that the system was much more reliable than it was.  I know that watchdog timers are needed in critical applications, but I get nervous when they're put into everyday systems.  

0 Kudos
8,998 Views
ErichStyger
Senior Contributor V

Hi @myke_predko ,

thanks for the flowers :-).

Yes, the FreeRTOS stack overflow check is simple yet powerful and I always have it turned on. And it is a subject I teach my students too.

Just to add: it is not perfect (and that's fine), and I have run into situations where it will not trigger the hook for the overflow.

It is not perfect for cases where an overflow happens because of a not (or not fully) initialized buffer. This could happen for some communication stacks. Below a screenshot of an example showing this:

ErichS_0-1618489963053.png

Marked in yellow is the stack allocated, and the application overflows and writes a 0xdeadbeef on the stack outside of the allowed area, and this is likely not detected.

Because detection method 2 happens at task context switch time, and if the SP (or PSP) goes above and beyond but back again and then the context switch happens, it is not detected.

I have to say that chances for this are not high, but if you know Mr. Murphy then you know it will happen.

That's why in addition to the runtime checking I do static checkings as outlined with the links mentioned, and this is why I like that stack size count in the MCUXpresso IDE so much: it gives me something else to look at:

ErichS_1-1618490431167.png

 

I hope this helps,

Erich