This is reported upstream :
69460 – ARM Cortex M0 produces suboptimal code vs Cortex M3
and
I have tested the GCC shipped with MCUXpresso and it exhibits the optimizer fault.
Basically the GCC Optimizer for Cortex M0/M0+ is producing very very unoptimized code (and there are no command line switch workarounds) compared to the code it will generate for a Cortex M3. The specific sequences in the test cases (6 are shown upstream) do not issue any M3 specific instructions, and the Compiled code would run on a Cortex M0/M0+. This is confirmed upstream.
A typical trigger for the bug will be accessing registers at known addresses, such as Peripheral registers. What GCC is doing for the M0/M0+ is for every unique address, it creates an entry in the literal table AND looks it up. For M3, it will create one entry in the Literal table and use address offsets to access nearby addresses. This means any and all M0/M0+ code accessing a bank of registers in a peripheral will run a lot slower and consume a lot more flash than it needs to.
The tests reveal that the functions in the test cases are up to 114% larger than they need to be, and code space is up to 40% larger than it needs to be. AND as the instructions are reading excessive literals from the flash, they are slow instructions. Every one using GCC for programming M0/M0+ should be aware of these faults, and if your code seems larger than it should or slower, then the only current workaround is to recode in assembler.
Example:
const uint32_t v1 = 0x80000001; // First Value
const uint32_t v2 = 0x80000002; // Second Value
const uint32_t v3 = 0x80000003; // Third Value
const uint32_t v4 = 0x80000004; // Fourth Value
/* TEST1 : Write 32 bit values to known register locations */
void test1(void)
{
volatile uint32_t* const r1 = (uint32_t*)(0x40002800U); // First Register
volatile uint32_t* const r2 = (uint32_t*)(0x40002804U); // Second Register
volatile uint32_t* const r3 = (uint32_t*)(0x40002808U); // Third Register
volatile uint32_t* const r4 = (uint32_t*)(0x4000280CU); // Fourth Register
*r1 = v1;
*r2 = v2;
*r3 = v3;
*r4 = v4;
}
M3 Optimiser in GCC Generates this (Which is 100% Valid M0 Code):
00000000 <test1>:
0: 4b04 ldr r3, [pc, #16] ; (14 <v1+0x8>)
2: 4a05 ldr r2, [pc, #20] ; (18 <v1+0xc>)
4: 601a str r2, [r3, #0]
6: 3201 adds r2, #1
8: 605a str r2, [r3, #4]
a: 3201 adds r2, #1
c: 609a str r2, [r3, #8]
e: 3201 adds r2, #1
10: 60da str r2, [r3, #12]
12: 4770 bx lr
14: 40002800 .word 0x40002800
18: 80000001 .word 0x80000001
Yet if you compile for Cortex M0, GCC Emits this:
00000000 <test1>:
0: 4a06 ldr r2, [pc, #24] ; (1c <v1+0x10>)
2: 4b07 ldr r3, [pc, #28] ; (20 <v1+0x14>)
4: 601a str r2, [r3, #0]
6: 4a07 ldr r2, [pc, #28] ; (24 <v1+0x18>)
8: 4b07 ldr r3, [pc, #28] ; (28 <v1+0x1c>)
a: 601a str r2, [r3, #0]
c: 4a07 ldr r2, [pc, #28] ; (2c <v1+0x20>)
e: 4b08 ldr r3, [pc, #32] ; (30 <v1+0x24>)
10: 601a str r2, [r3, #0]
12: 4a08 ldr r2, [pc, #32] ; (34 <v1+0x28>)
14: 4b08 ldr r3, [pc, #32] ; (38 <v1+0x2c>)
16: 601a str r2, [r3, #0]
18: 4770 bx lr
1a: 46c0 nop ; (mov r8, r8)
1c: 80000001 .word 0x80000001
20: 40002800 .word 0x40002800
24: 80000002 .word 0x80000002
28: 40002804 .word 0x40002804
2c: 80000003 .word 0x80000003
30: 40002808 .word 0x40002808
34: 80000004 .word 0x80000004
38: 4000280c .word 0x4000280c
Hi Steven,
Thanks for letting us know about this issue with the GCC Optimizer.
Best Regards!
Carlos Mendoza
Technical Support Engineer
Hi Carlos,
No problem.
Just a brief update on the issue. This is not yet directly applicable to MCUXpresso, but I have tested on GCC 7.1 and the problem is still present, but now also effects the Cortex-M23