"PRINTF" SRAM Overhead/Stack Overflow

myke_predko · ‎04-20-2021

I'm running my FreeRTOS (with USB CDC) development code on a FRDM-K22F and I've just started seeing something strange. To monitor operations, I periodically "PRINTF" (using the SDK library code) to put out a message. All works well when I run with MCUXpresso debug active.

If I stop the debugger (and optionally remove the USB Cable from the debug port of the Freedom board) which allows the code in the Freedom board continues executing until it encounters a "PRINTF" statement at which point it stops and indicates a Stack Overflow issue.

The "PRINTF" statements are typically only built into the code when I specify the pre-processor symbol SDK_DEBUGCONSOLE=0 and all "PRINTF" instances are surrounded by "#if" statements like:

#if (2 != SDK_DEBUGCONSOLE)
  PRINTF("Done");
#endif

When I remove the "PRINTF" statements by setting SDK_DEBUGCONSOLE=2 the application runs fine, no stack overflows detected regardless of whether or not it is connected to the development PC and whether or not debug is active.

So, I believe the problem is with the PRINTF statements. Now, I've increased the "configTOTAL_HEAP_SIZE" as well as the stack size for the first task that executes a "PRINT" statement but no joy. 25k is the total stacks size used by all the tasks and the total system heap is 36k.

I haven't checked running the code without MCUXpresso Debug active for a week or so, during which I've added a number of tasks and queues and a mutex along with increasing the total number of queues in FreeRTOSConfig.h. - but, as indicated, when MCUXpresso Debug is active, no issues or overflow detected/indicated. Along with that no task's stack is close to it's threshold.

Rather than pouding out different ideas, I'd thought I'd ask if anybody has any thoughts as to where I should look to understand this issue.

Thanx!

myke_predko · ‎05-12-2021

This thread got to be quite long before the solution became understood. I am marking it as "Solved" so that anybody in the future looking to understand this issue will see that it hasn't been left hanging with more than 50 replies.

The problem was that I modified "semihost_hardfault.c" with code to turn on an LED to indicate that a "hard fault" occured. When I put in the extra code, I was under the impression that the method was for out of bound conditions (this assumption was made because when I encounter an out of bounds write, execution stops at the start of "semihost_hardfault.c") and not as a tool to handle semihost error conditions.

I should have a) read the comments in the"semihost_hardfault.c" source file and b) not touched the file.

When I reverted back to the original code, the issues of the application going into an invalid state when "PRINTF" is encountered and no debugger active went away.

Don't change "semihost_hardfault.c"

I appreciate the help by @ErichStyger @jingpan & @bobpaddock in helping me understand what the issues was.

元の投稿で解決策を見る

myke_predko · ‎05-12-2021

This thread got to be quite long before the solution became understood. I am marking it as "Solved" so that anybody in the future looking to understand this issue will see that it hasn't been left hanging with more than 50 replies.

The problem was that I modified "semihost_hardfault.c" with code to turn on an LED to indicate that a "hard fault" occured. When I put in the extra code, I was under the impression that the method was for out of bound conditions (this assumption was made because when I encounter an out of bounds write, execution stops at the start of "semihost_hardfault.c") and not as a tool to handle semihost error conditions.

I should have a) read the comments in the"semihost_hardfault.c" source file and b) not touched the file.

When I reverted back to the original code, the issues of the application going into an invalid state when "PRINTF" is encountered and no debugger active went away.

Don't change "semihost_hardfault.c"

I appreciate the help by @ErichStyger @jingpan & @bobpaddock in helping me understand what the issues was.

ErichStyger · ‎05-13-2021

Hi @myke_predko ,

I think that recommendation Don't change "semihost_hardfault.c" is not correct.

Or maybe add "unless you know what you do" :-).

That semihost hard fault handler serves two purposes:

1) catch hardfaults and continue for cases where the application does semihosting without a debug session going on

2) catch all other hardfault handlers

I do modify that handler (most of the time I use the one in the McuLib), one just need to be careful with the assembly code to do it at the right spot, not impacting the semihost handling code.

I think the real recommendation should be

Do not use Semihosting or printf() unless you *really* know what you are doing

There is a place for semihosting (e.g. for File I/O on the host) or printf() for the very first 'hello world'. But otherwise it should be banned and never be used. As noted in the NXP semihost_hardfault.c:

// is meant as a development aid, and it is not recommended to leave
// semihosted code in a production build of your application!

I hope this helps,

Erich

myke_predko · ‎05-13-2021

Hey @ErichStyger

Thank you for the clarification - however, I should point out that it's really difficult to find any meaningful information on how to *really* know what semihosting is, especially when it comes to MCUXpresso and Kinetis. You've written some good articles on semihosting with KDS but they're 5+ years old and other references are device and debugger specific.

Honestly, the only clarification that I would add to my last post would be, if you are going to use "PRINTF" statements in your development code, make sure you can disable them using something like the "SDK_DEBUGCONSOLE" flag like I noted in the original post. That's always a good idea as you can save space and cycles in your actual application/product.

I've written and deleted a number of comments explaining what happened here, but when it comes right down to it, I don't feel bad about what happened here. While I didn't follow my normal validation process after making the change to semihost_hardfault.c, I did detect the issue before things went too far and I was able to characterize/describe it in such as way that @jingpan was able to recognize the problem and point me toward understanding what the problem was.

ErichStyger · ‎05-13-2021

Hi @myke_predko ,

I agree that there is not much information about semihosting available, probably because it has been defined by ARM and the documentation about it is rather obscure, but there are a few pices, for example

https://www.keil.com/support/man/docs/armcc/armcc_pge1358787046598.htm

or

https://developer.arm.com/documentation/100863/latest/

ARM did not change much (or anything) for the last years, so I did not make any updates to the articles you mention, maybe I should have. Maybe with the hope that developers won't use it and run into these problems. Unfortunately many vendor SDKs tend to use it as a default in their examples.

I hope this helps,

Erich

myke_predko · ‎04-22-2021

@ErichStyger

Did you delete a posting? I got notification with:

Yes, maybe back to field 0.

My understanding is this:

- here is PRINTF in your code

- if you run that code with debugger attached it works fine

- if that code runs without debugger it fails with a stack overflow at that printf statement

- you are using semihosting for the printf

Now there are two things:

- printf and its other family members are known to use a lot of stack space and cause all kind of (e.g. reentrancy) problems. Thus I don't use them

- if using semihosting, this throws a debug exception which is handled by the debugger (if attached). If there is no debug session going on it throws a hard fault. So the standard behaviour is 'printf does not work without debugger attached). To overcome this there shall be a special hard fault handler in your project (there is an option in the SDK project generation for it): look for that semihost_hardfault.c

So it would be useful how your stack overflow is detected/triggered. FreeRTOS has two different methods for this: Method 1 just compares the PSP at task context switch time while method 2 checks the pattern on the stack. If it is not a true stack overflow, something else (dangling pointer) might have written something to your stack which is not an overflow but more of a stack corruption. I would check if this is the case. And if I would set a watchpoint on that address to find out who is writing to that address.

I hope this helps,

Erich

I wouldn't characterize the situation the way you have:

I have been using PRINTFs in MANY of my projects without issues
- I went back and checked the previous versions and the problem behaviour is not present
I am running a basic FreeRTOS and NXP SDK implementation
- I am running semi-hosting on a FRDM-K22F board for application development
- I have NOT changed any libraries, driver, board code that has been provided by the MCUXpresso new project wizards
I added:
- The overflow handler we discussed last week
- ADC Interrupt Handler
- Two new tasks were added with two queues each
  - configQUEUE_REGISTRY_SIZE is more than large enough for all queue objects
- A mutex was added
  - configUSE_MUTEXES is set to 1
When I run with the MCUXpresso debugger active, no issues during operation
- When I stop the MCUXpresso, the next time a USB packet is received the overflow handler is invoked and the system crashes
- As part of this, some IO values are changed
  - I'm pointing this out because maybe it's coincidental that the overflow indicator LEDs light
If I take the PRINTFs out of the project (which I can do with the SDK_DEBUGCONSOLE build variable) no issues running the application with or without MCUXpresso operational

In terms of your comments back:

"there shall be a special hard fault handler in your project (there is an option in the SDK project generation for it): look for that semihost_hardfault.c" I just looked at the SDK project builder and can't find it. The only reference I can see in the semihost_hardfault.c code to changing it's operation is "// Allow handler to be removed by setting a define (via command line)". Can you be a bit more specific?
"how your stack overflow is detected/triggered" I'm using method 2.
1. Note the comments above, I'm now not 100% sure that the overflow check
2. Regardless, if the problem only occurs when PRINTF is active and debug is disabled, any ideas on how to catch the event?

Again, thank you for your time and comments,

myke

ErichStyger · ‎04-24-2021

Regardless, if the problem only occurs when PRINTF is active and debug is disabled, any ideas on how to catch the event?

Make sure that the code halts at the place the overflow is detected.

Let the target run without debug session (I hope you can still keep the debug cable atteched. Download and let the target run, then terminate the debugger). When it stops with the overflow, connect to the 'running' target:

Then you can pause the target and should see where it is and potentially the environment creating the issue.

Erich

ErichStyger · ‎04-24-2021

Just in case, with more description:

https://mcuoneclipse.com/2021/04/25/attach-with-the-debugger-to-a-running-target/

Erich

myke_predko · ‎04-25-2021

@ErichStyger

Thanx for Attach with the Debugger to a Running Target

I'm going through it.

myke

myke_predko · ‎04-25-2021

Hey @ErichStyger

Not going as planned or expected.

My process is:

Start debug of the application
Start Execution of the application
Connect application's USB CDC Device port to Windows PC using Tera Term
Test connection with a ping command
Halt/Break application's execution
Export SRAM contents
Resume execution
Stop Debugger
Try another ping command
1. This is expected to crash the system and it does
Attempt to attach
1. application resets and restarts

I've attached one of my export files. I'm dumping the entire contents of the SRAM - my plan was to get a dump before the failure and after the failure and do a kDiff to see where the problem lies by looking at the differences in the .map file. Good plan, but things aren't working out that way.

I tried playing around with the program/debug settings but all that did was make the project unusable (and I had to recreate it which is not a huge deal, but an annoyance).

Thinking about it, it makes sense that the application restarts when the attach takes place - I'm very sure that hardware is being incorrectly written to which means that when the attach takes place, the application hardware is in an unknown state and it is reset and executes from there, destroying any possible evidence.

Any idea what to do next? It's an interesting problem, but one I wish I didn't have.

myke

ErichStyger · ‎04-25-2021

Hi @myke_predko ,

Thinking about it, it makes sense that the application restarts when the attach takes place

No, the magic of attach is that it does not reset (unless you are in a hard fault where the chip resets).

I have seen that with weak reset pullups attaching the probe/cable can cause a pull on the reset line. Keep the probe/cable attached in that case.

Attach might be depending on the probe too (I just used that recently with a J-Link).

Erich

myke_predko · ‎04-26-2021

Hi @ErichStyger

No, the magic of attach is that it does not reset (unless you are in a hard fault where the chip resets).
I have seen that with weak reset pullups attaching the probe/cable can cause a pull on the reset line. Keep the probe/cable attached in that case.

Maybe I'm in a hard fault condition - I'll have to figure out how to display that with the hardware I have.

I am keeping the USB cable attached and since it's a Freedom board, the probe is being kept attached. I have everything running with the debugger, click on the "Terminate (Ctrl+F2)" control on MCUXpresso and then run a command which produces a PRINTF (ie the "ping" I mention above) and I get the crash.

As I said, interesting problem.

Any other ideas on how to identify the source?

myke

myke_predko · ‎04-24-2021

Hi @ErichStyger

I don't seem to have the ability to (re)attach the MCUXpresso debugger to the target after I stop it. I'm running Linux so I'll have to find a Win10 laptop with enough hard drive space for MCUXpresso and see if I can get that running.

I do think this approach has merit as I will comment on your last post.

myke

ErichStyger · ‎04-24-2021

Hi @myke_predko ,

this is not Windows related, it should work the exact same way on any MCUXpresso supported host.

Erich

myke_predko · ‎04-25-2021

@ErichStyger

Replying to "this is not Windows related, it should work the exact same way on any MCUXpresso supported host."

Well this is a bit embarrassing, when I looked originally the way I have MCUXpresso set up, my quickstart menul looks like:

But, when I drag the right edge to the right:

I'll take a look at doing the attach.

Thanx.

ErichStyger · ‎04-25-2021

On Windows I have a horizontal scroll bar like this:

myke_predko · ‎04-25-2021

@ErichStyger

Scroll bars:

In Ubutu, you have to look for them somewhat. They kind of appear if you roll over them but to see them you have to be over them with the left mouse down.

I don't feel so terribly bad about not seeing it.

ErichStyger · ‎04-24-2021

"there shall be a special hard fault handler in your project (there is an option in the SDK project generation for it): look for that semihost_hardfault.c" I just looked at the SDK project builder and can't find it. The only reference I can see in the semihost_hardfault.c code to changing it's operation is "// Allow handler to be removed by setting a define (via command line)". Can you be a bit more specific?

I'm referring to this:

There is an option in the project creation wizard to add it:

Erich

myke_predko · ‎04-24-2021

Hi @ErichStyger

I'm replying to your comment about enabling the "special hard fault handler" by using the Advanced Project Options.

This is the default, so it's always available.

myke

ErichStyger · ‎04-22-2021

Hi @myke_predko ,

Did you delete a posting? I got notification with:

No, I did not delete anything, maybe the NXP system was considering my replies as spam? @#*!@#

Erich

myke_predko · ‎04-22-2021

Hey @ErichStyger

You gotta watch the forum. i've had a few of my posts deleted or labeled as having invalid html (which I think @bobpaddock has also experienced in this thread) over time.