LPC17 FreeRTOS glitches

lpcware · ‎06-15-2016

Content originally posted in LPCWare by FlySnake on Sun Feb 19 14:19:27 MST 2012
Hi everyone!
My LPC1768 (also 1769 on lpcxpresso board) running FreeRTOS sometimes (10 minutes to 10 hours) freezes or reboots if WDT enabled. Under debugger I see strange HardFaults (Access Violation). Call stack:

Quote:
Thread [1] (Suspended: Signal 'SIGSTOP' received. Description: Stopped (signal).)
6 <symbol is not available> 0xfffffffe
5 <signal handler called>() 0xfffffff1
4 <symbol is not available> 0xfffffffe
3 <signal handler called>() 0xfffffffd
2 vPortStartFirstTask() port.c:153 0x000170e2
1 xPortStartScheduler() port.c:182 0x00017132

But all task already started and program is running well before the fault. WTF vPortStartFirstTask() and where 0xfffffffe in PC come from?
If I disable some "heavy" tasks the problem goes away, but it not dependent on specific task (i.e. bug in my code). For instance: I have a piece of code working in another project without OS. Here this code runs as a task and without problems with its functionality. When I disable this task - glitches disappear. Then I enable this task, but disable uIP and it also works without glitches. Same for another combinations especially "heavy" tasks . It is not stack overflow (I have stack overflow hook which prints a message before die). It is not "usual" HardHault because I have a hook on a all faults too and there are no messages before die, even debugger tells that this is HardHault.
I'm stuck. Maybe I heed to use FreeRTOS tracing features, but I don't know where to start, i.e. which piece of code possibly can do such awful thing.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Rob65 on Fri Feb 24 14:28:02 MST 2012

Quote: FlySnake

What is the difference between your calculation and RunTimeStats? There is idle ticks and % of cpu's time for each task

Thx - had not seen this before.
I need to look at this in more detail and do some testing before being able to give a detailed answer.
The difference is that RunTimeStats collects even more detailed information and that it uses a timer on the lpc17xx.

Quote:

This was one of my first questions about RTOS :) What if you have task(s) waiting for semaphore indefinitely? This task cannot continuously report about its status.

I am still not sure on how to use the watchdog. Most of my tasks are waiting on a message queue with messages coming from an external source or another tasks. If the external messages are not arriving anymore then I am currently resetting the external source. I do this by placing a timeout on the xQueueReceive(). There are a few tasks that are receiving messages from an external source like user input - it is hard to place a timeout on this since it may take days before the uses hits a key.
So maybe I should conclude that the watchdog should only be triggered by one (or a few) selected tasks or the scheduler itself.

Quote:

What about WDT interrupt instead of reset? I thought about it, but it turned out to solve the problem before :D

The idea behind the watchdog is to reset the system when something went very wrong. Interrupts disabled is a nice thing to capture - or what if something went wrong and the SP points to non-existing memory.
So the only way to handle a watchdog is by performing a reset.

I am convinced that my application is stable and that the watchdog will never bark - and yet I know that someday this is bound to happen ...

Regards,[INDENT]Rob
[/INDENT]

lpcware · ‎06-15-2016

Content originally posted in LPCWare by FlySnake on Fri Feb 24 10:46:54 MST 2012
Wow, thank you Rob!

Quote: Rob65
Been there ...
When I started with FreeRTOS I had similar strange hard faults and I had to keep remind myself that there are just too many FreeRTOS users around for this to be a FreeRTOS bug ...

Yes, I argree with you. 99.9% of problems caused by my own hands, not others :)

Quote: Rob65

After prototyping my application I started to create some standard platform code that is the basis for a number of new projects I am starting. Unfortunately I have a new job :D meaning I have less time to spend on my own projects.
Still, it might be worthwhile to look at [U]hg.bikealive.nl/Platform1754[/U]. There you'll find a first start of the code for my standard hardware platform under development - in Drivers/Platform/lcd_ST7565R.c you'll see how I implemented thread safe access using a mutex; the StartCritical() and endCritical() macros defined have two version to test both with and without thread safe use.
Also nice to look at might be the Platformtest/src/FreeRTOSHooks.c file.
In vApplicationIdleHook() I use asm("wfi") to put the MCU into sleep mode when there is nothing to do (the MCU will be woken up by either the systick or another interrupt). Depending on the application this preserves my batteries when there is nothing to do.

That's interesting, thx

Quote: Rob65

In the vApplicationTickHook() I keep track of the idle time. Just before going to sleep in the idle hook I keep track of the current value of the systick timer (this contains the number of clock ticks to go until the next 1 ms timeslice starts). I calculate an idle percentage that is averaged over a number of msecs.

What is the difference between your calculation and RunTimeStats? There is idle ticks and % of cpu's time for each task

Quote: Rob65

One thing I have not looked at yet is the watchdog.
I might want to do different things when a watchdog reset occured so I am thinking of creating a global variable to keep track of the reset reason. This allows me to e.g. keep track of the number of watchdog resets compared to the power on resets.
But before this, I need to figure out how to handle the watchdog. A very crude way might be to use a task that triggers the watchdog periodically: as soon as the system crashes there is a big chance that all tasks are halted and the the watchdog is fired. A more subtle way of using the wdog is to figure out which task(s) are critical and then only trigger the wdog when all of these tasks have reported they are still working (this can be as simple as resetting a bit in a bit-mask and trigger the watchdog and restore the mask as soon as all bits are
cleared).

This was one of my first questions about RTOS :) What if you have task(s) waiting for semaphore indefinitely? This task cannot continuously report about its status. The only way we can do this as I see is waiting for semaphore some time and continue loop if it has not been given. Wouldn't it be performance overkill?

Quote: Rob65

Looking at a more advanced recovery, I might want to save (part of) the system state to a log file on my SD card. This means that part of the tasks variables may not be zero-ed, these variables must then be placed in a separate .bss section that is not blanked on reset so I can save this to SD. This could be useful to see if there is a common reset reason: a bug in the system that needs to be fixed.
There might even be tasks that want to know the state when the reset occured and act differently depending on this.

What about WDT interrupt instead of reset? I thought about it, but it turned out to solve the problem before :D

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Rob65 on Thu Feb 23 12:00:24 MST 2012

Quote: FlySnake
Problem solved. Not protected IAP call was a cause

Been there ...
When I started with FreeRTOS I had similar strange hard faults and I had to keep remind myself that there are just too many FreeRTOS users around for this to be a FreeRTOS bug ... Libraries that are not thread safe is one of the main causes of strange errors that I sometimes still get stuck with so what I do now is to just setup a relatively simple test program to isolate every new feature or library that I start to use.

After prototyping my application I started to create some standard platform code that is the basis for a number of new projects I am starting. Unfortunately I have a new job :D meaning I have less time to spend on my own projects.
Still, it might be worthwhile to look at [U]hg.bikealive.nl/Platform1754[/U]. There you'll find a first start of the code for my standard hardware platform under development - in Drivers/Platform/lcd_ST7565R.c you'll see how I implemented thread safe access using a mutex; the StartCritical() and endCritical() macros defined have two version to test both with and without thread safe use.

Also nice to look at might be the Platformtest/src/FreeRTOSHooks.c file.
In vApplicationIdleHook() I use asm("wfi") to put the MCU into sleep mode when there is nothing to do (the MCU will be woken up by either the systick or another interrupt). Depending on the application this preserves my batteries when there is nothing to do.
In the vApplicationTickHook() I keep track of the idle time. Just before going to sleep in the idle hook I keep track of the current value of the systick timer (this contains the number of clock ticks to go until the next 1 ms timeslice starts). I calculate an idle percentage that is averaged over a number of msecs.

This code is not complete or 100% correct (other interrupts than the systick interrupt are not taken into account - yet they result in system load) but it gives me an idea of the load of my system. This at least gives me a more realistic figure than the example where a counter is incremented each time the vApplicationIdleHook() is called.

One thing I have not looked at yet is the watchdog.
I might want to do different things when a watchdog reset occured so I am thinking of creating a global variable to keep track of the reset reason. This allows me to e.g. keep track of the number of watchdog resets compared to the power on resets.
But before this, I need to figure out how to handle the watchdog. A very crude way might be to use a task that triggers the watchdog periodically: as soon as the system crashes there is a big chance that all tasks are halted and the the watchdog is fired. A more subtle way of using the wdog is to figure out which task(s) are critical and then only trigger the wdog when all of these tasks have reported they are still working (this can be as simple as resetting a bit in a bit-mask and trigger the watchdog and restore the mask as soon as all bits are cleared).

Looking at a more advanced recovery, I might want to save (part of) the system state to a log file on my SD card. This means that part of the tasks variables may not be zero-ed, these variables must then be placed in a separate .bss section that is not blanked on reset so I can save this to SD. This could be useful to see if there is a common reset reason: a bug in the system that needs to be fixed.
There might even be tasks that want to know the state when the reset occured and act differently depending on this.

I welcome any suggestions and remarks on those ideas.

Regards,[INDENT]Rob
[/INDENT]

lpcware · ‎06-15-2016

Content originally posted in LPCWare by FlySnake on Wed Feb 22 01:11:58 MST 2012
Thanks for your replies. Problem solved. Not protected IAP call was a cause

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Rob65 on Tue Feb 21 10:44:28 MST 2012

Quote: FreeRTOS.org

If the watchdog is the thing that causes it to reboot (which is I think the purpose of the watchdog) and it runs fine without the watchdog, then it would seem the kicking of the watchdog is your problem.

We think alike! Of course the whole idea of the watchdog is to keep kicking the system to stay alive. I hope the OP does realize that this is of course only useful when the software checks for a barking dog and acts upon it. If a watchdog trigger just means you are trying the exact same thing again then you might end up in an endless loop.

Quote:

That is good - although IAP is presumably going to be a long operation so will impact your system responsiveness.
...
That should be ok [B]provided [/B]it is not called from a FreeRTOS critical section.

The better solution might be to create a semaphore and guard all accesses to IAP calls using this semaphore. It might be a good idea to collect all IAP stuff in a 'library' and use IAP through these functions.

Stopping the IRQ will also stop the systick and since IAP calls can take a long time this means that you cannot rely on vTaskDelay (and other timing inside FreeRTOS) to be accurate.
And of course it is perfectly OK for other parts of the application to run while the task waiting on the IAP call is waiting for that call to finish.
(of course this only works if preemption is enabled - but I think that is the default in the LPCXpresso FreeRTOS project)

Regards,

Rob

lpcware · ‎06-15-2016

Content originally posted in LPCWare by on Tue Feb 21 02:31:01 MST 2012

Quote:
My LPC1768 (also 1769 on lpcxpresso board) running FreeRTOS sometimes (10
minutes to 10 hours) freezes or reboots if WDT enabled.

That does not give enough information to allow a full reply. For example, how, where and how frequently is the watchdog being kicked?

If the watchdog is the thing that causes it to reboot (which is I think the purpose of the watchdog) and it runs fine without the watchdog, then it would seem the kicking of the watchdog is your problem.

Quote:
Thread [1] (Suspended: Signal 'SIGSTOP' received. Description: Stopped (signal).)
6 <symbol is not available> 0xfffffffe
5 <signal handler called>() 0xfffffff1
4 <symbol is not available> 0xfffffffe
3 <signal handler called>() 0xfffffffd
2 vPortStartFirstTask() port.c:153 0x000170e2
1 xPortStartScheduler() port.c:182 0x00017132

I should just ignore that stack frame if you are in an exception you need to unwind the stack in code to see what it really is.

Quote:
cmp r3, #0 is last instruction I see via debugger, it is just

That instruction is benign, I would look at the previous instruction, which is a load, and has more potential for exceptions (what is the address it is loading from?]

Quote:
Most likely is you are blowing the stack for one of your processes. The is no stack checking, so once you exceed the allocated stack size, you will be overwriting application data, or other stacks.

I think the OP said he had stack checking on, although that does not check the interrupt stack.

Quote:
One tiny task is used for IAP calls and other tasks use it a lot (through a queue). I forgot taskENTER_CRITICAL() / taskEXIT_CRITICAL() around one of IAP calls. "Heavy" tasks use it more often then others. 12 hours without glitches now.

That is good - although IAP is presumably going to be a long operation so will impact your system responsiveness.

Quote:
Is it OK to use __disable_irg() under FreeRTOS ? Or should I use taskENTER_CRITICAL() instead which also disables interrupts, but by setting priority mask equal to configMAX_SYSCALL_INTERRUPT_PRIORITY ?

That should be ok [B]provided [/B]it is not called from a FreeRTOS critical section.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by FlySnake on Tue Feb 21 02:07:23 MST 2012
I didn't know about task viewer, thank you, Rob, this is good tool.
Probably I've found the problem. One tiny task is used for IAP calls and other tasks use it a lot (through a queue). I forgot taskENTER_CRITICAL() / taskEXIT_CRITICAL() around one of IAP calls. "Heavy" tasks use it more often then others. 12 hours without glitches now.

And 2 questions about WDT.
1. Feeding (or kicking?) is not atomic operation and if interrupt occurs between writing 0xAA and 0x55, then stupid dog will cause reset. In standard driver library we have WDT_Feed() function

__disable_irq();
LPC_WDT->WDFEED = 0xAA;
LPC_WDT->WDFEED = 0x55;
__enable_irq();

Is it OK to use __disable_irg() under FreeRTOS ? Or should I use taskENTER_CRITICAL() instead which also disables interrupts, but by setting priority mask equal to configMAX_SYSCALL_INTERRUPT_PRIORITY ?
2. Are there any recommendations how to use whathdog under RTOS properly?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Rob65 on Mon Feb 20 11:25:22 MST 2012

Quote: FlySnake
and xTaskResumeAll() called almost from everywhere (delays, queues etc)

I am not exactly sure what you mean by this?
Are you callig xTaskResumeAll() from your code or is this what you see happening inside the FreeRTOS code?

You probably mean inside FreeRTOS. vTaskSuspendAll() and xTaskResumeAll() are used in the FreeRTOS code a few times. If you look in the FreeRTOS user manual (and you will need some kind of a manual to use any RTOS) you will see that this pair of functions suspends all tasks temporarily.

But to get back to your original problem, there must be an error in your code somewhere. I have been using FreeRTOS a lot and found that every time that something like this happens there is an error in my own code and not in FreeRTOS. I have tested FreeRTOS with extra tasks added to load my system up to the point where I was unable to handle all interrupts and messages generated - just to see what my application and FreeRTOS do.

I suggest you use the FreeRTOS viewer that is included in the LPCXpresso IDE to aid in debugging your tasks. Start the FreeRTOS viewer from the menu: Windows -> Show View -> Other -> OpenRTOS Viewer / Task Table

Regards,
[INDENT]Rob
[/INDENT]

lpcware · ‎06-15-2016

Content originally posted in LPCWare by CodeRedSupport on Mon Feb 20 10:58:34 MST 2012
Most likely is you are blowing the stack for one of your processes. The is no stack checking, so once you exceed the allocated stack size, you will be overwriting application data, or other stacks.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by FlySnake on Mon Feb 20 01:56:22 MST 2012
Disassembly:

0001599c: xTaskResumeAll+56      movw r3, #11936 ; 0x2ea0
000159a0: xTaskResumeAll+60      movt r3, #4096  ; 0x1000
000159a4: xTaskResumeAll+64      ldr r3, [r3, #0]
000159a6: xTaskResumeAll+66      cmp r3, #0  <<-- last instruction executed normally
000159a8: xTaskResumeAll+68      beq.n 0x15aa8 <xTaskResumeAll+324>

cmp r3, #0 is last instruction I see via debugger, it is just

if( uxCurrentNumberOfTasks > ( unsigned portBASE_TYPE ) 0 )

and xTaskResumeAll() called almost from everywhere (delays, queues etc)