Michael,
By providing service routines for all possible interrupts you've covered the most common causes. Have you also got routines under the "reserved" interrupts that "should never happen"?
When you manage to fix the problem, please let us know what the fix(es) are.
Here are some additional ideas:
Make sure that the startup code you are using is writing to ALL the write-once registers, even if the reset values are what you want. This makes sure that they don't get changed by misguided code in the running application.
Fill unused flash locations with something that stops execution faster than 0xFF. Freescale AN2400/D "HCS12 NVM Guidelines" discusses various possibilities. I think that they all will work for the S12X, but you should check.
The idea is that 0xFF is garbage stack pointer and proceed, which is not helpful, to say the least. The various fill values stop execution in one or two cycles and don't garbage the stack.
Scowl at the clock signals and the power supply. Glitches in the clock and power supply noise can cause completely unexpected operation of the MCU. For the power supply, it may work to put a digital scope on the power pins and set it to trigger on higher or lower voltage. You could try the same with the XTAL and EXTAL pins, but that may not work too well.
If you can cause a COP watchdog reset reliably when the problem occurs, you could trigger on RESET.
Enable the clock monitor, if it isn't, but I don't think that it will detect glitches perfectly.
If you are using the PLL, you might have some subtle stability problems. If the PLL filter doesn't have suitable values, the system clock frequency will wander around too much or too fast, and you could get unexpected CPU operation. I've seen this with no PLL filter, but the working range for the filter is much wider than the Freescale PLL calculator indicates, so if your component selection is recommended by the PLL calculator and the actual values are close to what they should be, there should be no problem.
The on-chip trace can be set to trigger on accesses inside or outside a range, but you can only set up two ranges, so if your code is in several flash pages, you will have to make many runs, triggering on the unused gaps between pages.
A bad solder joint or trace somewhere is a remote possibility, so tapping on the circuit board around the MCU might trigger the problem. If it does, some experimenting might identify the physical area of the problem.
Hey Steve,
I'll give those suggestions a try.
I am also using a Semaphore to save data into a shared area of RAM - the XGATE part is the only place saving into RAM, but I also have the semaphore in the CPU part, where it only accesses the data. Do I even need a semaphore at the CPU end if it only access, and does not modify, the data? Could this also be causing a problem, that the CPU is holding the semaphore too long (it holds the semaphore for every byte through about 50 or so to send an on SCI bus - but at the same time, there may be incoming CAN messages to save to the same array...)
Thanks for the reply,
- Michael
Michael,
The semaphore is needed when reading more than one byte/word at a time to ensure coherency of the message (you don't want to read one half of a message and then the other half from another message).
You should minimise the time that each core holds the semaphore because you are blocking the other core. Can you use more than one semaphore?
It's not obvious how that would cause the problem, but it may be worth a look.
Michael, the common errors are indicative of code runaway on the CPU.
Some thoughts:
1/ Could be unrelated to XGATE - could your CPU algorithm fail if the data coming from XGATE has a particular value?
2/ I assume you use XGATE to copy data into the RAM. Are you copying into the wrong location? Are you overwriting the CPU stack or destroying some other CPU info? Have a look at your CPU stack and variable contents to see if they are in range. Set up the RAM protection scheme.
3/ You may be able to get something from the trace buffer (CW 4.5). Have it continually running and see if you can extract a failure flow. Try having the trace cause a hardware break if you execute code from somewhere you don't expect
Message Edited by Steve on 05-11-200604:09 PM