Strange problems with SAI EDMA and optimization level

roger5 · ‎06-20-2019

After adding some code to my project, I found the the I2S audio no longer worked.

But, the code that was added was to a completely different part of the application.

On further investigation, I found that for some reason if I call the SAI initialisation at start of my main(), which is fw_main_task() since I'm using RTOS, the audio gets corrupted.

I also found that the transfer using

SAI_TransferSendEDMA(I2S0, &g_SAI_TX_Handle, &xfer);

Was not receiving the ISR callback to indicate that the transfer was complete.

Debugging into SAI_TransferSendEDMA, as far as I can tell, the parameters which are passed in &xfer seem to have a valid length, which I presume is the only thing that matters

A workaround to get the ISR to be called, seems to be to set the optimization to anything except -Os (optimize for size).

-O2 (which is almost the same as -Os) works OK, and even -O3 works.

So I'm kinda stumped about why...

1. I can't setup the SAI interface at the very beginning of main()

2. Why optimusation -O2 and -O3 work, and I get a callback (ISR), but -Os fails to do so.

Note, the optimization level is only in the function calling SAI_TransferSendEDMA. I initially tried turning down the optimization on fsl_si_edma.c but it didn't make any difference.

I also note that by default the SDK example projects are set to no optimization, which I find a bit odd, as normally I use -Os because -O0 results in very large binary sizes and code that runs unnecessarily slowly/

mjbcswitzerland · ‎06-20-2019

Hi Roger

I have never understood why examples, and also some developers - for the first 98% of development work -, use -O0. When switching to production code things then fall down, need fixing and the product that has been possibly well tested for non-production code is then delivered with only marginal testing in its final state (!!)

Although I know the main reasons for this to be the more difficult debugging of optimised code (especially GCC compiled) it would nevertheless be advisable for developers to invest a little time in understanding how to debug optimised code from the outset (after a coupe of days effort there is no real need for non-optiised code debugging and more) to get more reliable products finished correctly. Personally I never worked with non -Os code for the last >15 years and rarely consider even temporarily setting a file to another level to make debugging a little more obvious and so haven't suffered form any such surprises (apart from occasional new compiler versions being more aggressive).

In your case look for missing volatile declarations and also in-lined routines called one after the other that may internally access the same register (or even variable) - the second can cause even more unexpected optimisation of accesses that the compiler believes to be redundant. In almost every case correct use of volatile attribute will fix it.

Regards

Mark

Complete Kinetis solutions for professional needs, training and support: http://www.utasker.com/kinetis.html
uTasker: supporting >1'000 registered Kinetis users get products faster and cheaper to market
Request Free emergency remote desk-top consulting at http://www.utasker.com/services.html

errorek123 · ‎06-21-2019

Although I know the main reasons for this to be the more difficult debugging of optimised code (especially GCC compiled) it would nevertheless be advisable for developers to invest a little time in understanding how to debug optimised code from the outset (after a coupe of days effort there is no real need for non-optiised code debugging and more) to get more reliable products finished correctly.

Is there any source, book or whatever where I can find such information how to debug optimalized code? Because it's way harder to understand what's going on when stepping mode just jumps randomly through my code, instead of doing everything in order.

mjbcswitzerland · ‎06-21-2019

Brian

I mainly use the disassembly view to follow what the code is really doing: Then interpolate back to the source. You only really need to learn a hand full of assembler instructions to follow it and it gives a good insight into what the processor is really doing [I never actually studied ARM assembler and probably only consulted the instruction set a hand full of times since it is usually pretty clear what it is doing] and not what the high level source code is saying should be done.

What I missed noting is that I don't actually debug that much on the board itself because I use simulation (also for interrupt and DMA behavior) which means that it is only occasionally needed and then the debugging is low level where peripherals/registers are of main concern and not high level code (eg. there is little point debugging a standard C algorithm on the target that is not HW dependent and best to do it in a desk top environment instead).

Note that I work as an embedded system consultant and with many customers around the world. What I am always surprised of is the fact that programmers tend to use only a very small part of their debugger's capabilities and debugging is something that seems not to be taught in higher education at all. How many times have I seen blank expressions when I ask what the call stack is showing or ask someone to simply change the program counter to repeat a few instrcutions that don't look to be doing what the programmer intended. I lost count of the times I heard gasps and the expression "oh, I didn't know you could do that..." when we quite simply identify an issue that has cost a company days or weeks already....

Regards

Mark

errorek123 · ‎06-21-2019

Thanks for the reply Mark,

I will try to learn some basic assembler instructions to understand disassembly more. Is there any way to test baremetal and rtos code on simulation without using uTasker because i prefer to write my own stuff(atleast try) and learn on my own mistakes so i understand hw better and software too.

I'm pretty familiar with call stack but i never heard of moving program counter using debugger and i can't really find any informations about it, atleast not in mcuxpresso

roger5 · ‎06-21-2019

I doubt if there is a book on how do to this.

Depending on what the problem is, you can switch to -Oo or -Og to disable optimization and enable full debugging.

However if the problem is caused by optimization, its a bit harder to do.

I generally don't have a problem with optimization, as I tend to look at the code and try to think how it would look in assembler.

Its surprising what the compilers do and don't do. GCC can do some strange things, and writing what looks like very concise code doesn't always end up as concise and fast assembler code.

I often look at the disassembly view, because ARM assembler is normally quite easy to read, especially the thumb instruction set, which most MCU development often uses, and the compilers (well GCC), tend to be quite formulaic in what gets produced.

Its interesting to see what the assembler looks like, for instance with this

uint32_t i=0x123457

uint32_t j = i &0xff

is normally achieved by something like

j = ((i <<24) >> 24)

roger5 · ‎06-20-2019

Thanks

I'm curious what level of optimization you use. I generally use -Os because of the small amount of flash on some of the low end microcontrollers, and it seems to also give good performance, since its a variant of -O2

The project I'm working on was set entirely to -O0, because it was based on one of the SDK and they all default to -O0.

It took me some time to change it over to -Os and get it working again.

I have a feeling that my problem(s) may be something more pervasive, like the global handles to the SAI setup, getting overwritten by some other part of the application, and changing optimization level may be having an effect on where exactly in memory the globals are getting stored with reference to each other, rather than a problem with the compiler optimizing something out.

I have tried putting some "guard variables" around the SAI vars, but that didn't seem to make any difference, but the compiler could have chosen to put the guard variables wherever it wanted.

I don't think its practical to watch all the SAI related globals using the debugger the number of hardware watch points seems to be limited.

So I'm probably going to write some code to checksum the globals and keep an eye on it.

I guess the other thing to do is potentially compile in some sort of stress testing, and see if that can give a clue about whats going on.

mjbcswitzerland · ‎06-21-2019

Roger

I always use -Os (or the equivalent for IAR, Keil and so on). It gives smallest code and, logically, faster code at the same time.

>>Was not receiving the ISR callback to indicate that the transfer was complete.

If there is no interrupt I assume DMA didn't work/complete. Therefore I would look there - check registers when the transfer should start and compare with the same register set without optimisation. Also check the DMA error status after the transfer starts - if it errors rather than working it may be that there is a memory alignment problem caused by a different layout which gets lucky without optimisation.

Note also that DMA trashing memory is a common problem that is not obvious using normal break-point based debugging techniques. Check carefully buffers used for DMA transfers not being like 1 byte too short.

If you don't solve it after a day or two just schedule a remote debugging session with me. Such problems are typical ones that I get called to fix and I can usually identify and solve it within a short time (40 minutes are free so no one needs to worry about not being able to afford external developers).

Regards

Mark