MCUX 11.1.0 - C++ exceptions not being caught (w/ solution)

cancel
Showing results for 
Search instead for 
Did you mean: 

MCUX 11.1.0 - C++ exceptions not being caught (w/ solution)

Jump to solution
1,547 Views
drodgers
Senior Contributor I

At long last, I have arrived at a solution for this issue which I first reported over a year ago.  Here is the full write-up.

Summary

Certain C++ projects are susceptible to a condition where exceptions are not caught correctly when thrown.  This condition occurs at compile time and when present, any thrown exception will cause __terminate() to be invoked, regardless of any try/catch statement.  This condition is ultimately caused by a badly formatted linker script.

Test Setup

I have a custom Kinetis K24 board programmed and running a C++ application which I've been developing over the last couple of years.  The C++ application is using Newlib, which was compiled with exception support in place (Newlib-nano as distributed by NXP does not support exceptions).  I first observed this issue with MCUX 10.2.1; for my test, I used MCUX 10.3.0.  I am confident this issue would manifest if I were using MCUX 11.1.0, as the relevant linker files are identical between MCUX revisions (see below).

Failure and Analysis

When this issue manifests, any thrown exception will cause __terminate() to be invoked, regardless of any try/catch statement that is placed around the exception condition.  Here is code to test the abnormal condition:

  static void VerifyExceptions(void) {
    try {
      throw std::runtime_error("Exceptions are being handled normally.");
    } catch (std::runtime_error &e) {
      std::cout << e.what() << "\n";
    }
    /* If exceptions are not working correctly, then the above statement will
     * cause __cxa_throw() to call terminate() immediately.  Testing this at
     * startup assures that any issues with exceptions will be immediately
     * diagnosed during development. */
  }

I make a point of calling this method very early at startup so that if there is any issue with exceptions, I find out the moment I first run the software.

What I found when developing this project is that for a given build of code for my Kinetis K24, there is a roughly 50/50 chance that the build will exhibit the issue.  Rebuilding the same project with the same source files will always yield the same result; either exceptions work normally, or they don't work at all.  Adding, removing, or altering code will "re-roll the dice", usually in an unpredictable manner.

Workaround

When I first encountered this issue, I struggled to find ways to add or change code so as to create a working build.  In my testing, I managed to find a way to do it for my specific project.  Note that this only works for my specific project; I also found that changing my optimization settings could invalidate this workaround.

 protected:
  PowerMgr()
      : task_(),
        mutex_(),
        signal_queue_() {
    /* Comment or uncomment this line as necessary to get exceptions to work. */
    DoNothing();
  }
  /* Do nothing useful. */
  void DoNothing(void) {
    wibble_ = 123;
  }

If I build my code and find that exceptions are not being caught, I simply change whether line 7 above is commented or uncommented, recompile, and now exceptions are handled correctly.  If later I make additional changes to my project and exceptions no longer work again, I change line 7 back to how it was, and the project works once again.  In effect, for a given state of the project code, the call to DoNothing() either MUST be commented out, or MUST NOT be commented out.  The code WILL work if line 7 is in one state, and WILL NOT work if it is in the other state.  The only way to know what state is required is to build the code and test on the target; if it runs, leave it alone, but if it fails, change the commenting, recompile, and guaranteed it will now work.

Root Cause

Quite simply, I lacked the means or expertise to fully investigate this issue (apparently, so did NXP).  As I had a workaround, I posted what I was observing to the NXP forums and to the ARM forums in hope that someone might encounter a solution.  And thankfully, eventually, someone did.

The issue is with the .ARM.exidx section, and specifically the linker file template that generates the linker file for that section.  I won't try to explain it fully here (see the accepted answer to my ARM forum post for a more detailed explanation), but basically, the linker file template for the .ARM.exidx (named "exdata.ldt") is badly formatted, and a small modification to its structure resolves the exceptions issue.

There are two locations for the offending linker script template ("exdata.ldt").  They are:

  • [MCUX IDE location]\ide\configuration\org.eclipse.osgi\4\0\.cp\Data\linkscripts
  • [MCUX IDE location]\ide\plugins\com.nxp.mcuxpresso.tools.wizards_11.1.0.201909161352\Wizards\linker

This is how the file reads as installed:

    /*
     * for exception handling/unwind - some Newlib functions (in common
     * with C++ and STDC++) use this. 
     */
    .ARM.extab : ALIGN(${text_align}) 
    {
        *(.ARM.extab* .gnu.linkonce.armextab.*)
<#if (PLAIN_LOAD_IMAGE) >
    } > ${CODEX} AT> ${CODE}
<#else>
    } > ${CODE}
</#if>

    __exidx_start = .;

    .ARM.exidx : ALIGN(${text_align})
    {
        *(.ARM.exidx* .gnu.linkonce.armexidx.*)
<#if (PLAIN_LOAD_IMAGE) >
    } > ${CODEX} AT> ${CODE}
<#else>
    } > ${CODE}
</#if>
    __exidx_end = .;

Solution

The solution is simply to move the __exidx_start and __exidx_end declarations inside of the .ARM.exidx declaration, as follows:

    .ARM.exidx : ALIGN(${text_align})
    {
    __exidx_start = .;
        *(.ARM.exidx* .gnu.linkonce.armexidx.*)
    __exidx_end = .;
<#if (PLAIN_LOAD_IMAGE) >
    } > ${CODEX} AT> ${CODE}
<#else>
    } > ${CODE}
</#if>

The first half of the file remains unchanged.  I have attached both the original (broken) and the fixed versions of this linker script.  If you are experiencing this issue and you copy the fixed script to the "org.eclipse.osgi" location, you should see your issue resolve.  You can verify whether this fix is applied to your code by examining the linker file generated for your project build ("projectname_Debug.ld" in the Debug directory).  Here is how it looked before (search for "exidx" in your linker script):

    .ARM.extab : ALIGN(8) 
    {
        *(.ARM.extab* .gnu.linkonce.armextab.*)
    } > PROGRAM_FLASH

    __exidx_start = .;

    .ARM.exidx : ALIGN(8)
    {
        *(.ARM.exidx* .gnu.linkonce.armexidx.*)
    } > PROGRAM_FLASH
    __exidx_end = .;

And here's how it looked after the fix:

    .ARM.extab : ALIGN(8) 
    {
        *(.ARM.extab* .gnu.linkonce.armextab.*)
    } > PROGRAM_FLASH

    .ARM.exidx : ALIGN(8)
    {
    __exidx_start = .;
        *(.ARM.exidx* .gnu.linkonce.armexidx.*)
    __exidx_end = .;
    } > PROGRAM_FLASH

Requested Action

NXP, I would ask that you please review this report, review my ARM forums post and its replies, examine the modified linker script, verify that the modified linker script generates correct and functional code for each of your test cases, and then incorporate this corrected script in future MCUXpresso releases.  Please let me know if you have any additional questions, thank you.

David R.

1 Solution
1,112 Views
lpcxpresso_supp
NXP Employee
NXP Employee

Thanks for posting the additional information. 

As an aside, we have been generating the exidx symbols in the current way for many many years. And the way we have done this was originally based on how the default ld linker script generates them (which are also outside of the .ARM.exidx section).

But all that aside, we'll make the change you suggest for our next release (due late Feb).

Regards,

MCUXpresso IDE Support

View solution in original post

6 Replies
1,112 Views
lpcxpresso_supp
NXP Employee
NXP Employee

IIRC, there are a few older Kinetis parts where there is some form of 8 byte alignment requirement. But this is not commonly required (generally the standard 4 byte alignment is sufficient). This would explain why you only see the issue on K24 - and only when padding is required on the exidx section to get that to 8 byte alignment - but not RT1050

Regards,

MCUXpresso IDE Support

1,112 Views
lpcxpresso_supp
NXP Employee
NXP Employee

Thank you for your detailed report. We'll look into this.

If you are able to provide a test case project - that might be helpful. But if not, it would be help if you could also provide your map files from both the original test case, and with the modified linker script (or at least the portions of the map files where differences show up).

Regards,

MCUXpresso IDE Support

0 Kudos
1,112 Views
drodgers
Senior Contributor I

I've got your smoking gun.

So I pulled a fresh copy of my project out of the repo, and reset the linker script template to its original ("Old") version.  I then performed two builds, a functional ("Working") build, and a non-functional ("Broken") build.  The only difference between the two is that DoNothing() is called in the Broken build, and is commented out in the Working build.  (In previous code revisions, the situation would be reversed, where the functional build calls DoNothing(), while a non-functional build does not.)

Then I changed the linker script template to the fixed ("New") version, and repeated the functional ("Working") and was-previously-non-functional ("Not-Broken") builds.  Then I compared the MAP files, and the defect becomes obvious.  Let's look at the "Working" builds first, with the old and new linker scripts.

mcux_exceptions_working_case.png

What you see here is the net effect of moving the __exidx_start/end tags inside the .ARM.exidx section declaration.  Previously, the __exidx_start symbol landed outside the .ARM.exidx section, just before the .eh_frame declaration.  With the new linker file, the __exidx_start symbol is placed inside the section, immediately after the start, where it should be.  The reason this configuration works, is that __exidx_start and .ARM.exidx have the same value (0x000d9b48), so __exidx_start does indeed point to the start of the section.

Now let's look at the "Broken" builds, with the old and new linker scripts.

mcux_exceptions_broken_case.png

The defect should now become apparent.  With the old script, the label __exidx_start is placed immediately following the preceding text, which is at 0x000d9b24.  However, .ARM.exidx entries must be 8-byte aligned, and so the symbol .ARM.exidx is moved to 0x000d9b28.  Thus, __exidx_start no longer points to the actual start of .ARM.exidx, and any exception thrown will become uncatchable.  The new linker script fixes this by ensuring that __exidx_start is declared after the .ARM.exidx symbol is declared, i.e. inside the section.  With the new linker script, __exidx_start now has a value of 0x000d9b28, the same as .ARM.exidx, and exceptions work normally.

I've attached the snippets of the MAP files should you want to examine them.  Seems pretty clear-cut, this fix needs to be applied.  Might want to examine the rest of your linker scripts as well to make sure that similar issues don't exist elsewhere.

David R.

0 Kudos
1,113 Views
lpcxpresso_supp
NXP Employee
NXP Employee

Thanks for posting the additional information. 

As an aside, we have been generating the exidx symbols in the current way for many many years. And the way we have done this was originally based on how the default ld linker script generates them (which are also outside of the .ARM.exidx section).

But all that aside, we'll make the change you suggest for our next release (due late Feb).

Regards,

MCUXpresso IDE Support

View solution in original post

1,112 Views
drodgers
Senior Contributor I

Anecdotally, I can tell you that while this bug was endemic to my K24 project, it has not appeared in my RT1050 project, despite considerable wrenching over the past several months.  No idea why stuff is always aligned on the RT1050 but prone to misalignment on the Kinetis.  Anyway, hopefully we never see this bug again.

David R.

0 Kudos
1,112 Views
drodgers
Senior Contributor I

Then it means that the default ld linker script is not correct for all cases, because in this instance, it's demonstrably incorrect.  Getting it right is all I can ask for.  Thank you for including it in the next update.

David R.

0 Kudos