I HATE these kinds of problems. Once I worked on a laptop keyboard controller that the complaint was it would generate a reset after maybe 18-22 hours of heavy use. AARRGGHHH!!!! But I found it. Discovered it was hardware. And actually got a workaround fix!
Anyway, the first thing you need to do is find where the problem is occuring. Do you have access to the serial port? I'd start streaming status codes to a PC. In EVERY SINGLE SUBROUTINE, put an entry and an exit code, and stream them immediately. If nothing else, you can then look at the 'trap' on the PC and with a simple program see if you have a stack growth issue, or multi-thread issues.
If you feed your status to a serial buffer then realize that the problem may be occuring long after that status code actually occured, that other status codes may be in the TX queue when the freeze occurs. If you suspect that, kick the baud to as fast as you can handle and do it 'on demand', that is, no background send queue. The other option is to grab 2 bits and do something like an on demand I2C data send with a data and clock line, and set up a PC to just trap it with a decoder. Just realize that if you do the send 'realtime', it will impact your program timing.
If you see the problem happening after a specific sequence, now you know where to look, and even to start duplicating the errors.
Heck, if you do this 'on demand', you may find that the program is not 'locked up', just into a 'dedicated racetrack' where it's just looping and looping a set of instructions.
Hmm, you mentioned EEPROM... I had a similar problem with my EEPROM, where I was trying to read it after a write. Turned out I was getting in WAY too quick, and it was hosing... I never really looked into all the ramifications of the error, as once I realize what I was doing in the low level routine and fixed it, my other lockups all were solved, but it appeared I could hang the CPU with bad status. What was weird was it didn't seem to hang in the routine I did the bad stuff in, it would hang when another routine touched it. Doesn't matter, once I fixed it, the rest worked fine
By the way, one tip that may help. If you do the codes for entry and exit of every routine, do something 'simple' to flag them as exit/entry. Like set bit 7 on exit. That way code 01 is your routine, 81 is exit. 55 is entry, D5 is exit. That way you have a real simple way to scan your trap of the date for balanced entries and exits, as well as a visual 'look see' to see how things are going, as opposed to having to look each code up in a table. Doing this means almost that you don't care about the code until you find where it's hanging. And it's real simple to just count entry/exits so you know how deep you are in the stack.
Message Edited by mke_et on 2007-01-0709:16 AM