Code optimization

michele_darold · ‎05-28-2008

Hi, I have some problem with codewarrior compiler. I'm using CW 7.0.

My settings are the default setting of a project generated my auto configurator fo MCF52223 MPU.

So I have optimization of level 1,
Parameter passing COMPACT
Code Model FAR
Data Model FAR
I have linked the library
fp_coldfire.a
C_4i_CF_SZ_MSL.a
C_4i_CF_Runtime.a

when I try to increase the optimization level in a level 2-3-4 I see a real reduction of code size but my application won't run!!!

I have also try to change Parameter passing in REGISTER and including the library C_4i_CF_RegABI_SZ_MSL.a but the code doesn't run!!

I have also try to change the Parameter passing in REGISTER only and compiling with level 1 optimization.
The code not run but I have see that the code size is the same in all of optimization levels.
If I try level 2-3-4 I reach always the same size in level 1.

Leaving Parameter passing COMPACT and changing the optimization level i see a change of code size....

MY question is now about the optimization. What can I do to use this function? What parameter I need to use it? I need a particular version of CW?

Best regards

Michele Da Rold

RichTestardi · ‎05-28-2008

I will add a pointer to a bug I found with optimization levels 2-4, in case you might be having the same problem...

http://forums.freescale.com/freescale/board/message?board.id=CWCFCOMM&thread.id=1882

If you have a large switch statement, and one of the cases begins immediately with a "do" loop, the optimized code generation skips part of that case. The simple solution is to increment a global variable before the do loop.

One other thing... Turning on or off "register coloring" in the Project Settings -> Code Generation -> ColdFire Processor pane seems to have much more effect on code size and performance than any of the optimization levels. Unfortunately, with it on, you can't view (most) local variables in the debugger.

-- Rich

mjbcswitzerland · ‎05-28-2008

Rich

I generally still use CW6.3 but just did a couple of tests also with CW7.0

Base = standard uTasker project V1.3 + SP7 (no libraries)

The values are those displayed for code and RAM sizes in CW (assuming they are accurate):
CW6.3 maximum optimisation for size: 72k FLASH 7k RAM
CW6.3 no optimisation: 117k FLASH 7k RAM

CW7.0 - note that register coloring was automatically activated after converting CW6.3 project to CW7.0. CW6.3 has neither settings for register coloring, instruction scheduling, nor peephole
CW7.0 maximum optimisation for size: 78k FLASH 8k RAM
CW7.0 maximum optimisation for size - no register coloring: 108k FLASH 8k RAM
CW7.0 maximum optimisation for size - no register coloring, no instruction scheduling, no peephole: 115k FLASH 8k RAM
CW7.0 no optimisation: 80k FLASH 8k RAM
CW7.0 no optimisation - no register coloring: 112k FLASH 8k RAM
CW7.0 no optimisation - no register coloring, no instruction scheduling, no peephole: 120k FLASH 8k RAM

This shows that these new settings (register coloring, instruction scheduling and peephole) do indeed have a major effect on program size. The optimiser level setting much less.
Also it shows that the program size that CW6.3 achieves is about 8% smaller than CW7.0.

I have started using CW7.0 to debug 5222X and 5221X boards since CW6.3 doesn't seem to work with the demo boards - with the EVBs and external P&E BDM I can still debug with the CW6.3. However we have never had any problems with CW6.3 based project code, so the smaller build size suggests staying with it for the moment. Note that we avoid any library code - malloc() in CW6.3/CW6.4 has major problems and in CW7.0 there seem to be various difficulties (http://forums.freescale.com/freescale/board/message?board.id=CFCOMM&thread.id=4528).

Despite the few known problems I do however find the CW solution great - especially due to the fact that most people are using it for free as special edition. It makes the choice for a Coldfire processor much easier for a lot of customers who are otherwise worried about having to make new investments in development tools. The debugger is not bad at all and of course the Coldfires themselves are a dream to work with...

Regards

Mark

JimDon · ‎05-28-2008

Ya, but - did the code still work in 7.0 with maximum optimisation for size?
I assume you have some code in RAM?

mjbcswitzerland · ‎05-28-2008

Hi Jim

I just loaded the CW7.0 80k build to the board and connected it to the Internet http://demo.uTasker.com

It seems to work OK. We also have a number of users with projects who do use CW7.0 and I don't know of any problems (the uTasker project is delivered with maximum optimisation for size and generally is left like that - unless there are some debugging difficulties, where the optimisation may be taken back to make it a bit easier). We know of some problems with library use but don't have any known cases of optimisation related problems.

Perhaps you have a specific one which doesn't show up in most cases(?).

Regards

Mark

JimDon · ‎05-28-2008

If you look on this thread and follow the link in bugmans post, there does seem to be in issue, but I can't confirm or deny it myself.

I didn't know how much you had used 7.0. I could move to 6.4, but if the issue is a temporary minor decrease in optimization, I can live with that. However, if the code i generated wrong, that I will have to.

I sounds ok from what you have said.

mjbcswitzerland · ‎05-29-2008

Jim

Looking at BugMan's thread there is a problem in there somewhere but I believe that it will only be noticed in certain cases (eg. if DDR and PORT ordering is changed it will probably never cause a problem, but in other cases it can be disasterous). I did have a similar problem once with IAR where the ordering of code caused a region protected from interrupts to be changed around so that the registers to be protected where outside of the protected region:

eg.
disable_interrupts();
change critical register;
enable_interrupts();

only looking at the assembler code was is visible that it was doing the following:

change critical register;
disable_interrupts();
enable_interrupts();

The result was of course disasterous since there was no protection and resulted in occasional errors (and crash).

In that case we went back to a previous compiler version which hadn't caused problems.

The fact is that compilers are highy complicated programs and the pressure is on to make them more and more efficient. Every compiler change brings new risks with it that some code which was previously operating correctly could break. I have never used a compiler which doesn't have a bug in it somewhere. The trick is to find a way to live with known bugs rather than assume that the next release will finally be perfect - 'better the devil you know..." This is something that (especially) embedded programmers must always keep in the back of their minds (unless they are one of those who never programs in anything other than assembler - but they will have their own different problems to contend with [like porting 100'000 lines of working code from one processor to another - good C-code is usually a minor porting task whereas another processor could result in many months of work if only available in assembler]). If code doesn't work as expected a bit of assembler stepping usually shows if it is programmer or compiler related - the compiler still tends to do the best job here...how many compiler programmer's errors are found for each compiler error found???).

Regards

Mark

JimDon · ‎05-29-2008

Mark,

I agree 100% with all that you said. It is very complicated, and I do like CW - no compiler is ever perfect, and at least freescale has support and will listen to you. However the case you pointed out is one of the exact cases I was thinking of.

To requote the Wiki (my emphasis )

"In computer programming, a variable or object declared with the volatile keyword may be modified externally from the declaring object. For example, a variable that might be concurrently modified by multiple threads may be declared volatile. Variables declared to be volatile will not be optimized by the compiler because the compiler must assume that their values can change at any time. Note that operations on a volatile variable are still not guaranteed to be atomic."

I take this to mean order of execution as well.

This is not really a "hard" bug, as there is not a clear definition of what optimization is allowed to do, and technically the code is correct, however it is highly undesirable behavior. In the case of disabling interrupts, this can easily be fixed by call in a subroutine, which will not be optimized this way (the case you pointed out with the IAR compiler sound like it was a bug).

My real concern is that sometimes when setting module registers order of execution is important (like you want the kill the interrupt enable bit then set other registers, as an example).

ChrisJohns · ‎05-30-2008

Hi Jim,

As compilers optimise better and better user code will come under more and more pressure to be correct in relation to the standard. An example in the gcc world has been "aliased structs". This is a part of the C standard that has been around a while and allows a range of optimisations to occur. This issue is difficult to find and fix and disabled is some open source kernels that use gcc. In relation to volatile this has proved to be only part of the solution. The access will alway occur but where it occurs can change if the compiler can see an optimisation. A simple mask interrupt (declared as voaltile) and then variable access, even to a volatile variable can be moved around by the compiler in such as way as to fail. In the gcc world you need to use a memory barrier to tell the compiler to not allow movement of code around the barrier. If you are not aware of these compiler changes and improvement you can think the code generated by the compiler is broken when infact it is not the compiler but the code itself.

In RTEMS we added API support for memory barriers to handle the issue. If you look in the Linux, FreeBSD and other kernels you will see barriers being used in critical sections of code.

Regards
Chris

JimDon · ‎05-28-2008

One other thing... Turning on or off "register coloring" in the Project Settings -> Code Generation -> ColdFire Processor pane seems to have much more effect on code size and performance than any of the optimization levels. Unfortunately, with it on, you can't view (most) local variables in the debugger.

-- Rich

You have to look at the registers to see the locals. Not a pretty if you are not used to it, but many other debuggers I have used have the same issue. Also, sometimes if you stop using a local, the compiler may use that register for something else.

When I really need to watch a local , I might declare it outside of the function as a volatile (so it will not be optimized away) just while I am debugging it.

How many of you have installed the service pack for 7.0?

RichTestardi · ‎05-28-2008

Hi Jim,

> How many of you have installed the service pack for 7.0?

Which service pack are you referring to? Do you mean "CodeWarrior for ColdFire, v7.0 - MCF5227X Service Pack"? Does that change/fix anything if we're not developing for the 5227X series? My CodeWarrior Updater says I am up to date, otherwise?

Thanks!

-- Rich

JimDon · ‎05-28-2008

I can't say if it does or does not do anything other that add support for that chip.
That's kinda what I was hoping to find out.
May an FSL'er can comment.

JimDon · ‎05-28-2008

There are issues with optimization in 7.0. I submitted a service request and this is what they said:

"Yes, we have an issue about optimized size code. We'll fix this in a future update"

However, there is no date for the fix. If you ask they will send you a link to 6.4, but I can't say say what the difference is. It think if you apply the 7.0 service pack, it kills optimization all together, because my issue was that optimization did nothing as far as code size. I don't know this for a fact, but it seems that is what the service pack does. I can live with 7.0 and no optimization for now.

As for the other thread, that is a load (as in wrong). Changing the order of execution is a bug, plain and simple.

It wouldn't hurt to submit another one. if you can nail down the exact wrong code generated.

Message Edited by JimDon on 2008-05-28 11:25 AM

Nouchi · ‎05-28-2008

Hello,

Have look to this thread, it seems to be a similar problem.
If your program use I/O register and your application need to keep order with volatile data access, you need to disable peephole optimizer.

Emmanuel.

Message Edited by BugMan on 2008-05-28 04:27 PM

mjbcswitzerland · ‎05-28-2008

Hi

We always run with full optimisation so the reason for failure when optimisation is enabled is probably not due to the compiler making errors but rather code which can not tolerate the optimisation itself.
This is usually code which is reading registers which are volatile but the optimiser is handling them as non-volatile and sometimes removing related code. This can, for example, result in interrupt routines getting stuck in endlose loops since flags are not checked correctly (volatile declarations usually solve this).

If your code doesn't run you can try to find out whether it is getting stuck due to something like this.

The register passing option is useful for saving size - it will usually improve it by 5..10%. The libraries must match (as you have already pointed out), however also startup assembler code must match. We only had to change this at one location but this seems to be called in most BSPs and so possibly this is what you need too:

mcf5xxx_wr_vbr:                            /* sub-routine for access to VBR */
_mcf5xxx_wr_vbr:
/* move.l 4(SP),D0 */                    /* {2} remove this when working with parameters passed in registers */
   .long    0x4e7b0801                     /* assembler code for movec d0,VBR */
    nop
    rts

Note the single change (marked with {2}) to remove the passing of the parameter via stack to direct D0 passing.

Regards

Mark

www.uTasker.com

Code optimization

Code optimization

General