Hi
The stack size that you set in the linker script is just telling the linker to keep this space free. It doesn't protect anything outside of this space.
If you are using a multitasking operation system there will be also stack sizes for each task.
All of these are basically just sizes to reserve for each stack usage and if the processor (code, task etc.) use more that this there is basically a serious error that is likely.
The operation is not protected against (although it may be monitored) and generally can't be protected against without large restraints of the processor's operation (scripting languages may protect but I doubt that even the most sophisticated embedded operating system will even try to do so due to the overhead involved which would greatly slow all operation).
You need to check and know the greatest level of subroutine and interrupt depth on the main stack and the worst case paths that may temporarily allocate more temporary variable. You also need to be sure that you know any heap allocation limits. It is a good idea to avoid malloc()/free() environments where possible (especially in small embedded systems) due to potential holes developing and also small memory leaks (forgetting to return memory to the stack after use for example):
- a few years ago I worked on resolving problems with the train monitoring system at one of the busiest stations in Paris which would crash about once a month and potentially cause major service disruption when it happened at rush hours. By recording the train logging information over a period and simulating the system based on this information (but much faster) it was possible to reproduce the crash in 20 minutes. By building in a heap monitoring system into the code it was then possible to identify the heap usage and the modules that the heap belonged to. Quickly it could be shown that a single module was slowly becoming the main user of heap memory, and later that a simple routine that was monitoring trains entering and leaving the region was taking just a few bytes of memory and then never returning them. Typically after a months the heap was depleted and the system dead.
The fix then took just a few minutes but the software error had already cost a huge amount of money to the operator and the original manufacturer (as well as damage to their reputations).
Basically if you use malloc() you are responsible for the memory you request and also for cleaning it up. Software errors can be difficult to find but can be serious in nature and sometimes life-threatening. In small embedded systems there is usually little advantage of designing with such heap and better (more reliable and safer) to know the worst case and ensure that static memory is available for it (heap use will tend to have the same physical limits but make its monitoring much more complicated).
From your description there is no specific indication that you have a memory problem but it also can't be excluded. Often problems are also due to not correctly protecting variables that can be modified by both code and interrupts (or not protected from other tasks) since everything looks to work for a certain amount of time but the chance of corruption means that failure may take place after a period of time simply due to the chance of it happening increasing with time. Also DMA operation that runs away, writing over the end of an array (by one location) and many other typical such coding errors are possible.
At the end of the day it will certainly be a software error and so you will need to use the available debugging tools and techniques to home in on the reason it happens each time it takes place until the reason is known and understood, and then learn from the mistake(s) to avoid repetition in the future.
Regards
Mark