Enable LTO in compiler options break MWS-based projects

FedericoWegher · ‎09-10-2020

Hi.

I am developing a BLE project featuring MWS, composed of central + peripheral devices. I met a problem on central device, when it is build in release mode with LTO compiler option enabled. I am using MCUXpresso 11.2.0 and SDK 2.6.6 on KW38.

In debug build configuration, everything works fine: central device is able to scan, detect and connect to peripheral device.

In release configuration, central is not able to connect anymore. Related BLE connection events are not coming at all. In release configuration, I enabled LTO both at compile and link time and disabled debug info.

If I disabled LTO at compile time, all works fine. But final binary is too large.

I managed to reproduce the problem on SDK examples: TM central and sensor. In central, I made the following changes:

#define gMWS_Enabled_d 1 // In app_preinclude.h

void App_Init(void) // In temperature_collector.c
{

}

In debug configuration TM demo works fine.

In release configuration, after having enable LTO at compile and link time and disabled debug info, central does not connect anymore to peripheral. So exactly like in case on my application.

If MWS is not enabled, so with original TMC sources, the problem does not occur. So I think the problem is in mws.c.

This guess is confirm by the workaround I managed to apply in my own project: I set mws.c file-specific compiler options and disabled LTO at compile time. However, this does not work on TMC.

I am afraid there might be other problems due to LTO like this, so I need proper solution and not a workaround. Thank you.

bobpaddock · ‎09-11-2020

I've tried LTO many times in the embedded space and have yet to get working code from it.
In some versions of the compiler LTO itself is simply broken, always check the compiler notes/changelog.

Predominantly LTO has issues with Volatile and it removes "unused code" that is actually used such as interrupt service routines that are not marked as 'used'.

Any function that is decorated with attribute IRQ must also be decorated with 'used'.
IRQs should also be decorated with no_instrument_function, and for really obscure reasons needs to be done on both the declarations and definition unlike the other attributes.

Example of a UART ISR:

Declaration:

__attribute__( ( no_instrument_function ) ) void LPUART1_IRQHandler( void ) __attribute__( ( interrupt( "IRQ" ), used, no_instrument_function ) );

Definition:

__attribute__( ( no_instrument_function ) ) void LPUART1_IRQHandler( void )

The Instrument stuff has to do with using GCC's built in profiler.

See See Poor Man’s Trace: Free-of-Charge Function Entry/Exit Trace with GNU Tools | MCU on Eclipse
http://mcuoneclipse.com/2015/04/04/poor-mans-trace-free-of-charge-function-entryexit-trace-with-gnu-...

I can post my trace.c if anyone cares about it.

FedericoWegher · ‎09-11-2020

Thank you for your reply.

Does it mean MCUXpresso SDK and examples cannot be compiled with LTO options?

Please share your trace.c and explain how to use it. It is not clear to me you mentioned instrumentation and tracing.

Best regards.

bobpaddock · ‎09-11-2020

"Does it mean MCUXpresso SDK and examples cannot be compiled with LTO options?"

I would assume so, as I expect they would not have put in the proper attributes such as 'used'.

Reasons like this is why I only look at the SDK code source to see how it does something and never actually use the code in my products.

Here are the LTO notes from my Makefile from trying to get it to work over the years.
At least give -flto-report a shot and see if tells you what it removed.

See also if MCU added which I believe are required:

#LDFLAGS += -flto -ffunction-sections -fdata-sections

# Link-time optimization does not work well with generation of
# debugging information. Combining -flto with -g is currently
# experimental and expected to produce unexpected results.

# -fwhole-program

# Assume that the current compilation unit represents the whole
# program being compiled. All public functions and variables with the
# exception of main and those merged by attribute externally_visible
# become static functions and in effect are optimized more
# aggressively by interprocedural optimizers.

# This option should not be used in combination with -flto. Instead
# relying on a linker plugin should provide safer and more precise
# information.

# Link Time Optimization (LTO) gives GCC the capability of dumping its
# internal representation (GIMPLE) to disk, so that all the different
# compilation units that make up a single executable can be optimized
# as a single module. This expands the scope of inter-procedural
# optimizations to encompass the whole program (or, rather, everything
# that is visible at link time).

# To see it in action, complle with -v -Wl,-v -save-temps and spot for
# /cc1, /as, /collect2, /lto-plugin, /lto1 and /ld.

# LTO is broken until ARM version GCC 8.3.

# Note that the same optimization flags (-flto -Os) should be passed
# both at compile time and at link time.

# LTO requires the same optimization setting be passed to the linker
# that the compile phase is using. Keep this even when LTO is
# disabled for consistency in the options between compile and linkng.
LDFLAGS += -O$(OPT)

#CFLAGS += -flto
#CPPFLAGS += -flto

# Enables the use of a linker plugin during link-time optimization.
#LDFLAGS += -fuse-linker-plugin

# Specify -flto=jobserver to use GNU make's job server mode to
# determine the number of parallel jobs. This is useful when the
# Makefile calling GCC is already executing in parallel. You must
# prepend a ‘+’ to the command recipe in the parent Makefile for this
# to work. This option likely only works if MAKE is GNU make.
#CFLAGS += -flto=jobserver
#CPPFLAGS += -flto=jobserver
#LDFLAGS += -flto=jobserver

#LDFLAGS += -flto -ffunction-sections -fdata-sections

# Prints a report with internal details on the workings of the
# link-time optimizer. The contents of this report vary from version
# to version. It is meant to be useful to GCC developers when
# processing object files in LTO mode (via -flto).
#LDFLAGS += -flto-report

# -why_live symbol_name
# Logs a chain of references to symbol_name. Only applicable with
# -dead_strip. It can help debug why something that you think should
# be dead strip removed is not removed.

https://gcc.gnu.org/wiki/LinkTimeOptimization

If the code is to big without LTO the code may simply be to big.

In the past I would have suggested using Gimpel Lint for static analysis to find out how to make the code smaller (In effect doing what LTO was doing).
Alas they don't sell Lint any more only Lint+ with a bizarre unusable license model.

Trace is unrelated to LTO issue, it is related to the attributes. I'll make a separate post about that sometime.

FedericoWegher · ‎09-16-2020

Thank you for your replies.

Unfortunately, all flags are already set properly for linker and compierl: -flto, -ffunction-sections, -fdata-section, --gc-sections, -Os, -s. ALso -fwhole-program cannot be applied to SDK case, being several translation units.

Finally, I checked differences in MWS.o build with and without -flto at compile time. I see nothing relevant. So I cannot explain why removing -ftlo as compiler option to MWS directory in sdk fixes the issues.

I really need NXP feedback about this point.