I was hoping that someone would have a try at this one. There are some important consequences, and there may be a lot of devices out there with these problems.
This is what the inner loop compiles to:
80101eea: 280e movel %fp,%d4
80101ef8: 0684 ffff ff60 addil #-160,%d4
for (i = 0; i < 40; i++) {
counts[i] = nStuckCounter;
}
801024d4: 2044 moveal %d4,%a0
801024d6: 2039 8080 003c movel 8080003c <nStuckCounter>,%d0
801024dc: 20c0 movel %d0,%a0@+
801024de: bdc8 cmpal %a0,%fp
801024e0: 66f4 bnes 801024d6 <main+0xb34>
The inner loop consists of four machine code instructions. The output is thus:
Prior to Force, nStuckCounter = 0
Loop 0, nStuckCounter = 3
Loop 1, nStuckCounter = 7
Loop 2, nStuckCounter = 11
Loop 3, nStuckCounter = 15
Loop 4, nStuckCounter = 19
Loop 5, nStuckCounter = 23
... Loops 6 to 35 exactly as you'd expect ...
Loop 35, nStuckCounter = 143
Loop 36, nStuckCounter = 147
Loop 37, nStuckCounter = 151
Loop 38, nStuckCounter = 155
Loop 39, nStuckCounter = 159
After Unforce, nStuckCounter = 208841
Even though the interrupt routine is "solidly stuck", the mainline gets to execute ONE instruction between each interrupt.
So what harm could that do?
On pretty much any other CPU, if you have a stuck interrupt, you find out about it really quickly as the device locks up solid in the interrupt routine. It doesn't get out of the door with a bug like that.
With THESE CPUs, the code still runs! It runs between 10 and 100 times slower than usual, depending on how many instructions the interrupt service routine has in it, but it RUNS. It runs slower than when you forget to enable the Program Cache (in the higher end and faster chips that have that). So if you're wondering why your code is running slower than you think it should be, maybe you have a stuck interrupt. Or you should turn the cache on. Or both.
I found this due to a different problem. The code it was running was a simple multi-threaded system using setjmp() and longjmp(), but with preemption from a high priority timer interrupt, that forced a lower priority interrupt to perform the context switch. The secondary thread looks like this (example, WAY simplified):
static void task_loop(sTask_t *a_psTask)
{
while (true)
{
a_psTask->eState = TASK_STATE_RUN; /* Allow switching */
(*(a_psTask->pTaskFunc))(a_psTask->pUserRef);
a_psTask->eState = TASK_STATE_IDLE; /* Disallow switching */
if (setjmp(a_psTask->sContextTask) == 0)
{
longjmp(a_psTask->sContextMain, 1);
}
}
}
When the timer goes off it check the "psTask->eState", and only forces a switch if it is "TASK_STATE_RUN". "TASK_STATE_IDLE" means "don't switch, I'm about to do it myself".
The higher priority interrupt checked for that and scheduled the lower priority one. It all went wrong if there were a bunch of medium priority interrupts (Ethernet, CAN, serial port) that got in between those other two interrupts. They allowed the code above to step into the "setjmp()" at one instruction per intervening interrupt. When the second interrupt went off it called "setjmp()" in the middle of the above call which corrupted the context, later causing a stack corruption.
The simple fixes were to have the second interrupt check "psTask->eState" or to disable interrupts completely around the switch.
So where's this in the Reference Manual? Nowhere that I can find. The CFPRM's documentation of "RTE" says that it restores the state, but doesn't say if a subsequent interrupt happens before or after that instruction is executed. The way it works may be a side-effect of this feature:
ColdFire Family Programmer’s Reference Manual, Rev. 3
Chapter 11
Exception Processing
11.1 Overview
ColdFire processors inhibit sampling for interrupts during the first instruction of all exception handlers.
This allows any handler to effectively disable interrupts, if necessary, by raising the interrupt mask level
in the SR.
And also it seems for the first instruction executed on any exception return.
Don't trust the "Auto Save" on this forum. It just lost me an hour's work and the "Auto Save" had only saved the first minute or two. Fortunately I copy/save to the clipboard regularly, and that saved me from having to type it all in again.
Tom