Hello,
I have a problem with a program that randomly stucks.
MCU: Kinetis K10DX32
bare metal project
CodeWarrior Development Studio 10.6
ARM Ltd. Windows GCC C compiler
ARM Ltd. GCC Build Tools Version: 4.7.3
Processor Expert
optimization level: optimize size (-Os)
memory usage:
section size addr
.interrupts 248 0
.cfmprotect 16 1024
.text 25088 1040
.ARM 8 26128
.bss 2364 536866816
.romp 24 536869180
._user_heap_stack 1536 536869204
.ARM.attributes 51 0
.debug_info 43710 0
.debug_abbrev 10665 0
.debug_loc 10922 0
.debug_aranges 2104 0
.debug_line 14453 0
.debug_str 7221 0
.comment 121 0
.debug_frame 4284 0
Total 122815
problem description:
I have a nearly finished project with a quite complex program, all the functions are working well.
In the last time the program randomly stucks at a certain point, but it has not really crashed, because interrupts are still working.
After some seconds of hanging the watchdog resets the program.
The frequency of occurrence can differ from device to device, maybe depending on the working situation. But even worth, it varies with the code size.
A device that was stuck every five minutes before, was working for hours after I added some code lines in another context.
So changing the code size can improve or worsen the situation.
I was able to find out the line where the process stucks, it's a comparison of a global variable with a function parameter.
C code:
uint8_t myglobalvar;
void myFunc(uint8_t myparam)
{
if (myparam==myglobalvar) { // here it stucks
return;
}
// some other action ...
}
assembler listing:
1216 .section .text.myFunc,"ax",%progbits
1217 .align 1
1218 .global myFunc
1219 .thumb
1220 .thumb_func
1222 myFunc:
1223 .LFB15:
1455:../Sources/xyz.c ****
1456:../Sources/xyz.c **** void myFunc(uint8_t myparam)
1457:../Sources/xyz.c **** {
1224 .loc 1 1457 0
1225 .cfi_startproc
1226 @ args = 0, pretend = 0, frame = 0
1227 @ frame_needed = 0, uses_anonymous_args = 0
1228 @ link register save eliminated.
1229 .LVL77:
1458:../Sources/xyz.c **** if (myparam==myglobalvar) {
1230 .loc 1 1458 0
1231 0000 094B ldr r3, .L117
1232 0002 1978 ldrb r1, [r3, #0] @ zero_extendqisi2
1233 0004 8142 cmp r1, r0
1234 0006 0DD0 beq .L104
1235 .L112:
1459:../Sources/xyz.c **** return;
1460:../Sources/xyz.c **** }
// some other action ...
1257 .L117:
1258 0028 00000000 .word myglobalvar
1259 002c 00B00340 .word 1073983488
1260 .cfi_endproc
I'm not so firm with assembler code, maybe someone can see the problem at first sight.
It's also possible that the line itself is ok, but some other code is corrupting the memory or SP or something else.
Any help would be greatly appreciated!
Thanks
Jörg Volger wrote:
optimization level: optimize size (-Os)
In the last time the program randomly stucks at a certain point, but it has not really crashed, because interrupts are still working.
So changing the code size can improve or worsen the situation.
uint8_t myglobalvar;
void myFunc(uint8_t myparam)
{
if (myparam==myglobalvar) { // here it stucks
return;
}
// some other action ...
}
I'm not so firm with assembler code, maybe someone can see the problem at first sight.
It's also possible that the line itself is ok, but some other code is corrupting the memory or SP or something else.
Any help would be greatly appreciated!
Thanks
Try -O2 in place of -Os and see if that makes any difference.
The symptoms are that of a stack overflow. It is where the code that is 'returning' to which is likely invalid that makes it 'get stuck'.
Can you get a back trace?
If your current tools can not do it, look in to using USBDM with GDB.
Interrupts can often run just fine when the foreground task is lost. This is why the watchdog shall never be refreshed inside interrupts.
After some more debugging I found out that the above mentioned line is not the problem (... what led me there?).
The new suspect a is function that changes the settings of PDB and ADC (initial setup of PDB and ADC is made in Processor Expert).
If I omit this function the problem disappears. Of cause I need the function for a perfect measurement.
the content of the function:
*************************************************************************************************
// stop conversions
ADC0_SC1A &= ~(ADC_SC1_AIEN_MASK);
ADC0_SC1B &= ~(ADC_SC1_AIEN_MASK);
// stop PDB
PDB0_SC &= ~(PDB_SC_PDBEN_MASK)
// clear COCO flags by reading result registers
result[0] = ADC0_RA;
result[1] = ADC0_RB;
// set PDB modulus
PDB0_MOD = PDB_MOD_MOD(sm);
// set PDB trigger delay
PDB0_CH0DLY1 = PDB_DLY_DLY(sm/2);
// set hardware average
switch (avgs) {
case AVGSSTD: { // averaging with 32 samples
ADC0_SC3 |= (ADC_SC3_AVGE_MASK|ADC_SC3_AVGS(0x03));
break;
}
case AVGSLOW: { // averaging with 16 samples
ADC0_SC3 &= (uint32_t)~(uint32_t)(ADC_SC3_AVGS(0x03));
ADC0_SC3 |= (ADC_SC3_AVGE_MASK|ADC_SC3_AVGS(0x02));
break;
}
case AVGSDIS: { // averaging off
ADC0_SC3 &= ~(ADC_SC3_AVGE_MASK);
break;
}
}
// enable converter interrupt again
ADC0_SC1A |= (ADC_SC1_AIEN_MASK);
ADC0_SC1B |= (ADC_SC1_AIEN_MASK);
// enable PDB
PDB0_SC |= PDB_SC_PDBEN_MASK;
// load registers
PDB0_SC |= PDB_SC_LDOK_MASK;
// restart counter
PDB0_SC |= PDB_SC_SWTRIG_MASK);
*************************************************************************************************
What happens? After calling the function the next check of the COCO flag in the typical loop like
while ((ADC0_SC1A&ADC_SC1_COCO_MASK)==0) {}
can fail if the conversion remains not completed.
Normally it works, but sometimes the flag remains forever 0 after exiting the above function.
Is there somewhere a wrong order of the commands or could it be a timing problem?
(Quite confusing is the fact that changing the code somewhere else can influence the quantity of errors)
BTW: I ported the project to a MCU with double memory, still errors. Then I compiled it without any optimization, still errors.
If the function is simply moved so that it is at a different address (move it to end of the file it is in for example), does anything change?
Hi Bob,
thanks for your idea, unfortunately it didn't help. I moved the function to other places inside the file, also to another file.
My impression is that the way of stopping and starting PDB and ADC must be the problem. Maybe it's a wrong order or a missing setting.