M0/M0+ optimization bug in GCC

stevenjohnson · ‎04-30-2017

This is reported upstream :

69460 – ARM Cortex M0 produces suboptimal code vs Cortex M3

and

Bug #1502611 “Poorly optimised code generation for cortex M0/M0+...” : Bugs : GNU ARM Embedded Toolc...

I have tested the GCC shipped with MCUXpresso and it exhibits the optimizer fault.

Basically the GCC Optimizer for Cortex M0/M0+ is producing very very unoptimized code (and there are no command line switch workarounds) compared to the code it will generate for a Cortex M3. The specific sequences in the test cases (6 are shown upstream) do not issue any M3 specific instructions, and the Compiled code would run on a Cortex M0/M0+. This is confirmed upstream.

A typical trigger for the bug will be accessing registers at known addresses, such as Peripheral registers. What GCC is doing for the M0/M0+ is for every unique address, it creates an entry in the literal table AND looks it up. For M3, it will create one entry in the Literal table and use address offsets to access nearby addresses. This means any and all M0/M0+ code accessing a bank of registers in a peripheral will run a lot slower and consume a lot more flash than it needs to.

The tests reveal that the functions in the test cases are up to 114% larger than they need to be, and code space is up to 40% larger than it needs to be. AND as the instructions are reading excessive literals from the flash, they are slow instructions. Every one using GCC for programming M0/M0+ should be aware of these faults, and if your code seems larger than it should or slower, then the only current workaround is to recode in assembler.

Example:

const uint32_t v1 = 0x80000001; // First Value
const uint32_t v2 = 0x80000002; // Second Value
const uint32_t v3 = 0x80000003; // Third Value
const uint32_t v4 = 0x80000004; // Fourth Value

/* TEST1 : Write 32 bit values to known register locations */
void test1(void)
{
    volatile uint32_t* const r1 = (uint32_t*)(0x40002800U); // First Register
    volatile uint32_t* const r2 = (uint32_t*)(0x40002804U); // Second Register
    volatile uint32_t* const r3 = (uint32_t*)(0x40002808U); // Third Register
    volatile uint32_t* const r4 = (uint32_t*)(0x4000280CU); // Fourth Register

    *r1 = v1;
    *r2 = v2;
    *r3 = v3;
    *r4 = v4;
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

M3 Optimiser in GCC Generates this (Which is 100% Valid M0 Code):

00000000 <test1>:
   0:     4b04           ldr     r3, [pc, #16]     ; (14 <v1+0x8>)
   2:     4a05           ldr     r2, [pc, #20]     ; (18 <v1+0xc>)
   4:     601a           str     r2, [r3, #0]
   6:     3201           adds     r2, #1
   8:     605a           str     r2, [r3, #4]
   a:     3201           adds     r2, #1
   c:     609a           str     r2, [r3, #8]
   e:     3201           adds     r2, #1
  10:     60da           str     r2, [r3, #12]
  12:     4770           bx     lr
  14:     40002800      .word     0x40002800
  18:     80000001      .word     0x80000001
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Yet if you compile for Cortex M0, GCC Emits this:

00000000 <test1>:
   0:     4a06           ldr     r2, [pc, #24]     ; (1c <v1+0x10>)
   2:     4b07           ldr     r3, [pc, #28]     ; (20 <v1+0x14>)
   4:     601a           str     r2, [r3, #0]
   6:     4a07           ldr     r2, [pc, #28]     ; (24 <v1+0x18>)
   8:     4b07           ldr     r3, [pc, #28]     ; (28 <v1+0x1c>)
   a:     601a           str     r2, [r3, #0]
   c:     4a07           ldr     r2, [pc, #28]     ; (2c <v1+0x20>)
   e:     4b08           ldr     r3, [pc, #32]     ; (30 <v1+0x24>)
  10:     601a           str     r2, [r3, #0]
  12:     4a08           ldr     r2, [pc, #32]     ; (34 <v1+0x28>)
  14:     4b08           ldr     r3, [pc, #32]     ; (38 <v1+0x2c>)
  16:     601a           str     r2, [r3, #0]
  18:     4770           bx     lr
  1a:     46c0           nop               ; (mov r8, r8)
  1c:     80000001      .word     0x80000001
  20:     40002800      .word     0x40002800
  24:     80000002      .word     0x80000002
  28:     40002804      .word     0x40002804
  2c:     80000003      .word     0x80000003
  30:     40002808      .word     0x40002808
  34:     80000004      .word     0x80000004
  38:     4000280c      .word     0x4000280c
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Carlos_Mendoza · ‎05-03-2017

Hi Steven,

Thanks for letting us know about this issue with the GCC Optimizer.

Best Regards!
Carlos Mendoza
Technical Support Engineer

stevenjohnson · ‎05-04-2017

Hi Carlos,

No problem.

Just a brief update on the issue. This is not yet directly applicable to MCUXpresso, but I have tested on GCC 7.1 and the problem is still present, but now also effects the Cortex-M23

M0/M0+ optimization bug in GCC

M0/M0+ optimization bug in GCC

General