Lorenzo Micheletto

(DSC) (Codewarrior 11) 32bit variable register optimization bug in DSC Compiler

Discussion created by Lorenzo Micheletto on Jan 24, 2019
Latest reply on Jan 30, 2019 by ZhangJennie

I'm using Codewarrior 11 for MCU with the DSC toolchain.


DSC compiler and/or inline assembly  optimizer for MC56F84789VLL (DSP 56800EX core)   do not handle correctly access to LSP (lower 16bit word) of 32bit variable when it is mapped to an accumulator register.


When using maximum optimization options ( -DOPTION_CORE_V3=1  -opt level=4 -opt speed -inline level=8 -inline auto -sprog -v3 -requireprotos -v3 ) the following byte-swapping code:


inline UINT32 swapGetUINT32(register const UINT8 *R2Reg)
register UINT32 AReg;
// N.B. by declaring "register UINT32 XReg;"
// into assembly istructions you can use
// Areg --> 32bit register (either dst A,B,C,D or src A10,B10,C10,D10)
// Areg.0 --> lower 16bit word (either A0,B0,C0,D0)
// Areg.1 --> upper 16bit word (either A1,B1,C1,D1)


.optimize_iasm on
moveu.bp X:(R2Reg)+,Y1   // 1 1
moveu.bp X:(R2Reg)+,AReg // 1 1 // *(p+1) into AReg.1 , clear AReg.2,AReg.0
moveu.bp X:(R2Reg)+,Y0   // 1 1
moveu.bp X:(R2Reg)+,X0   // 1 1
asll.l #8,Y              // 2 1
move.w X0,AReg.0         // 1 1
or.l Y,AReg              // 1 1
.optimize_iasm off
return AReg;

INSTEAD of generating the following code  (in the example below AReg gets mapped to A, but the compiler can generate the same inline sequence using A,B,C or D):


moveu.bp X:(R1)+,Y1
moveu.bp X:(R1)+,A  // writes to A1 and clears A0
moveu.bp X:(R1)+,Y0
moveu.bp X:(R1)+,X0 // DSC does not have "moveu.bp X:(R1)+,A0", so we copy to X0
asll.l #0x000008,Y  // shift lower bytes in Y1,Y0 to upper bytes
move.w X0,A0 // and then we copy X0 to A0
or.l Y,A     // now we merge the byte-swapped upper and lower bytes

SOMETIMES the function swapGetUINT32 gets compiled as:

moveu.bp X:(R1)+,Y1
moveu.bp X:(R1)+,A   // WRITES (R1) to (A1), CLEAR A2, A0
adda #-3,SP,R4
move.l A10,X:(R4)    // WRITES A0 to (SP-3), A1 to (SP-2)
moveu.bp X:(R1)+,Y0
moveu.bp X:(R1)+,X0
asll.l #0x000008,Y // shift lower bytes in Y1,Y0 to upper bytes
move.w X0,X:(SP-2) // WRITES X0 to (SP-2) (where A1 was stored, overwriting it)
move.l X:(R4),A    // reload A (it contains A1 = X0, A0 = 0 )
or.l Y,A    // now we merge, but the result is not a correct 32bit byte-swap

The resulting code is less efficient and the returned value is WRONG!


It seems like the compiler or the code optimizer assumes "moveu.bp X:(R1)+,A" writes (R1) to A0, while on 56800EX cores a 16bit write to an accumulator register (either A,B,C or D) writes to the MSP (either A1,B1,C1,D1) and CLEARS the associated LSP and EXT registers.


Also the optimizer chooses to write A to external memory and then read it back

when it is absolutely not necessary.


The attached files contain the source code to reproduce the bug and a disassembly of the output.


In the attached file xxxx.c I also included two solutions to this problem (both are based on forcing the usage of A register instead of letting the compiler choose the best optimization).


I solved the issue in my program by modifying the assembly code, but the root cause of this bug needs to be identified and fixed, because it is likely to have more widespread manifestations than just this one

( it wrongly handles register mapping and stack access of 32bit values).