SW Hang/ Illegal Opcode in HC/HCS ;)

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

SW Hang/ Illegal Opcode in HC/HCS ;)

3,341 Views
denny_george
Contributor I
Hello,
 
Can anyone tell me about Software hang/anamolies,its causes- that forces the watchdog to interfere and reset the system.If possible, would you analyse from the MCU point of view (MCU internals,registers,memory etc) rather than from any high level language point of view.
 
 
I have one more request/question.Its regarding illegal opcode.How does illegal opcode occur in an MCU.Is it only by Program Memory corruption?Any other possibilities.I'm aware most controllers (68HC/HCS)support illegal opcode resets.But how is illegal opcode generated in the MCU after flashing compiled code into it.
 
Patiently waiting for your feedback.


Message Edited by Geo*** on 2008-10-23 03:46 PM
Labels (1)
0 Kudos
15 Replies

1,163 Views
denny_george
Contributor I
Hello All,
 
@Peg:  "These errors are most often generated by the code jumping to an incorrect location caused by an incorrectly calculated offset or similar." I agree.But I feel "incorrectly calculated offset " happens at compiling stage.Right ? So can I conclude  HANG can also be attributed to wrong offset production in the compiling stage.
 
 
 
@kef:  I am sorry I could not fully understand "IMO watchdog".Is it different from normal timeout WD.You said"I mean if software hangs, then watchdog can make it going again. It's better than really dead thing, right?"
Do you mean to say WD has no much use from recovering from hang condition or continues to be in hang condition.Or it recovers from it.
 
"Illegal opcode happens the most often not due to selferased bits of flash, but due to uninitialized pointers, due to stack overflow etc."
 
uninitialized pointers==> How ?Kindly can you explain this.I've heard this uninitialized pointers leads to crash.But never knew how?
 
due to stack overflow ==>Yes, If the program code resides in RAM (relocatable code-Flash routines) and gets overwritten then there is a case of illegal opcode issue.Is there any point which i missed to see?
 
 
 
@Lundin: "Another reason could be a corrupt binary file downloaded into the MCU from a PC, or that the download itself faced EMI. Hopefully, both binary files and download tools use checksums to check the data" .
S19 has checksum field as you said.Doesn't it protect from corrupt hex codes (in s19) being flashed into the MCU, even in case of EMI environment.No corrupt code enters from s19 point of view enters MCU;This is my understanding,Please correct me if I am wrong.
 
 
 
Kindly give me your valuable feedback.
0 Kudos

1,163 Views
peg
Senior Contributor IV


Geo*** wrote:
Hello All,
 
@Peg:  "These errors are most often generated by the code jumping to an incorrect location caused by an incorrectly calculated offset or similar." I agree.But I feel "incorrectly calculated offset " happens at compiling stage.Right ? So can I conclude  HANG can also be attributed to wrong offset production in the compiling stage.
 
 
Kindly give me your valuable feedback.


Hi,

No I was not really trying to blame the compiler or whatever but the programmer. I was more referring to things like "unhandled cases". i.e. the programmer assuming that a value will stay within a certain range and not placing checks or allowances if it somehow would exceed the "normal" range. These sort of bugs can go undetected for "years" as the value does stay in the normal range except in exceptional circumstances. Although you now seen to be asking "how can this happena assuming the programme is 100% perfect".


0 Kudos

1,163 Views
Lundin
Senior Contributor IV
: "Illegal opcode happens the most often not due to selferased bits of flash, but due to uninitialized pointers, due to stack overflow etc."

: uninitialized pointers==> How ?Kindly can you explain this.I've heard this uninitialized pointers leads to crash.But never knew how?

If you try to access a part of the memory map where nothing resides, you will get an illegal opcode interrupt. If your pointers aren't initialized, they are pointing at a random location.

: due to stack overflow ==>Yes, If the program code resides in RAM (relocatable code-Flash routines) and gets overwritten then there is a case of illegal opcode issue.Is there any point which i missed to see?

When the stack overflows, you could get the same error as for unitialized pointers. The SP could then point at a non-valid memory location. Ie it has nothing to do with where the code resides, though code in RAM can cause further issues like the ones yoy mentioned.

If you are smart when you design your memory map, you put the stack on top of the RAM and make sure that the address above RAM isn't valid. If your program gets a stack overflow, you will then get the opcode interrupt + reset instead of the program starting to behave randomly. It will also be much easier to find stack error bugs using this method.


: @Lundin: "Another reason could be a corrupt binary file downloaded into the MCU from a PC, or that the download itself faced EMI. Hopefully, both binary files and download tools use checksums to check the data" .

S19 has checksum field as you said.Doesn't it protect from corrupt hex codes (in s19) being flashed into the MCU, even in case of EMI environment. No corrupt code enters from s19 point of view enters MCU;This is my understanding,Please correct me if I am wrong.

Then I will correct you :smileyhappy:
S19 is just an ASCII file which is the Motorola/Freescale standard way of storing programs on your PC. The BDM pod will read this file but translate it to raw binary data, which is then sent to the MCU.

First of all, the checksum in s-records is laughable. It just sums everything, there is a huge chance that such a checksum algorithm fails to detect file corruptions. The serious approach would have been to use CRC-8 or better. But no matter how good the checksum is, no checksum is 100% fail proof.

Second, what happens between the PC and the MCU is all in the hands of your BDM pod manufacturer. Hopefully they are using checksums/read-back, I have no idea.
0 Kudos

1,163 Views
MrBean
Contributor I
Ah. Message 5&8 you mean.
 
Unintentional execution is easy:
Stack return address taken from stack (RTS), with corrupt stack (under-/overflow, mismatch).
Messed up jump addresses from a table or something like that.
Null pointers -> jump to address 0.  Etc.
 
On some MCU's you can trap illegal opcodes or illegal address access.
After a reset you also want to look what caused the reset.
I'd look from the MCU point of view since reset and/or illegal opcodes are clearly not within the flow of a higher language-program. 
0 Kudos

1,163 Views
denny_george
Contributor I
WAR is taking place :smileyhappy:
 
 
@Mr Bean,
Stack return address taken from stack (RTS), with corrupt stack (under-/overflow, mismatch).
Messed up jump addresses from a table or something like that.
 
RTS(Surely calling function will put PC there in stack but maximum damage would be  of stack overwritten in case of overflow.Underflow case doesn't appear in this case) and corrupt stack (under-/overflow, mismatch) both will never happen.And hence Unintentional execution is not possible..Only illegal opcode sometime later.Some time later = Only God knows
 
@Lundin: "If you try to access a part of the memory map where nothing resides, you will get an illegal opcode interrupt. I didn't see any documentation regarding this.Can you specify in datasheet plz.
 
If your pointers aren't initialized, they are pointing at a random location.
Nexus:smileyvery-happy:oesn't Linker allote memory location for uninitialised pointers?Which Compiler are you using?

: due to stack overflow ==>Yes, If the program code resides in RAM (relocatable code-Flash routines) and gets overwritten then there is a case of illegal opcode issue.Is there any point which i missed to see?

When the stack overflows, you could get the same error as for unitialized pointers.
Nexus:I really doubt.It depends on where you initialised your SP.If your SP is initialised in Higher memory address area(Advised in datasheet),how illegal opcode reset can occur.It will continue to work creating havoc(it tries to run madly and finally fall to illegal opcode reset)
 
 
 
"The SP could then point at a non-valid memory location. Ie it has nothing to do with where the code resides, though code in RAM can cause further issues like the ones yoy mentioned."
Nexus:" I think you have initialised SP near lower RAM addr and above non-valid memory location is present.Its risky isn't it .How do you predict the stack space required in case your program needs it.


Message Edited by Geo*** on 2008-10-24 01:21 PM
0 Kudos

1,163 Views
MrBean
Contributor I
:smileyvery-happy:  Your question seems to be: Why do i need a watchdog or illegal opcode catching...
 
Well, you are right that in a normal, correctly written & compiled program, no unintentional illegal opcodes, illegal addresses or watchdog resets will occur. None.
With the exception of extreme EMI or radiation strong enough to flip a bit (very very small chance under normal circumstances).
A good program doesnt hang.
 
But, software writers do make mistakes. Compilers are written by software writers, also ...  :smileywink:
Watchdog and catching reset causes are there to be able to detect and catch these unfortunate mistakes.
 
 
Intentionally a C compiler allows to shoot yourself in the foot easilly:
eg. for a 908JB16, usefull for testing :
 
#define ILLEGAL_ADDR_FETCH_CRASH asm JMP 0x1000  // non existant address
#define ILLEGAL_OPCODE asm DCB #0xAC                         // illegal opcode
while(1);          // watchdog reset
0 Kudos

1,163 Views
denny_george
Contributor I
Dear MrBean,
 
I  just wanted to avoid illegal opcode,other reset reasons -bugs(not the protective resets) in a program as much as possible.Anyone would want to do that.
 
You are right software writers do make mistakes,so do me.Avoiding them should be our goal.And hence was my question.
 
Please keep adding any further valuable infos if you get regarding reasons of illegal opcode and reasons of SW hang.


Message Edited by Geo*** on 2008-10-24 02:19 PM
0 Kudos

1,163 Views
MrBean
Contributor I
Illegal opcodes can be generated in many ways.
The common cause is by unintentionally executing from RAM or other data-areas.
 
Illegal address access is also posible, accessing memory area that is non-existing.
 
The 2 above generate a reset (see ILOP & ILAD) by themselves, so there will be no "hang" + watchdog reset.
Look at the reset cause!
 
An easy way to hang the software and generate a watchdog reset is a loop that checks for a condition that does not occur.
DO NOT kick the watchdog from an interrupt routine !!
 
 
PS.: "IMO" is short for: "In My Opinion" :smileywink:


Message Edited by MrBean on 2008-10-24 11:13 AM
0 Kudos

1,163 Views
denny_george
Contributor I
Hi MrBEAN:smileywink:
 
Thanks for your feedback.
 
Questions regarding occurance of HANG and generation of illegal opcode was seperate from my side, though they are sometimes interrelated.
 
I knew this
"The 2 above generate a reset (see ILOP & ILAD) by themselves, so there will be no "hang" + watchdog reset.
Look at the reset cause!"
 
"DO NOT kick the watchdog from an interrupt routine !!" You pointed out right.This surely is a GOLDEN rule for any WD design unless you want to disable it due to some unavoidable circumstances.
 
 
About illegal opcodes:"The common cause is by unintentionally executing from RAM or other data-areas." Did you mean stack overflow into relocatable code like flash routines ?Or anything else?
 
 
"An easy way to hang the software and generate a watchdog reset is a loop that checks for a condition that does not occur." Why should someone check for a condition that never occurs?Should be caught at the time of reviews.
 
 
Finally I'm not frustrating you guys.Please understand me.I'm trying to get deep into the problem
 
 
0 Kudos

1,163 Views
MrBean
Contributor I
Checking, in a loop, for a condition that does not occur before a watchdog timeout is better said.  :smileyvery-happy:
 
A stack over- or underflow could be the cause of runnning into a wrong memory area, yes, but also other causes could be.
 
If you could be more specific about the problem it would be easier to help.
0 Kudos

1,163 Views
denny_george
Contributor I
Hi MrBean,
 
"Checking, in a loop, for a condition that does not occur before a watchdog timeout is better said".   OK.Its going to happen but not before timeout,right! BAD DESIGN!!! :smileywink:
 
 
Sorry Bean,
My question was very generic,not specific to any controller,but applies to all controllers:smileywink:.
 
Can you answer 5 and 8.
0 Kudos

1,163 Views
denny_george
Contributor I
Correction on my last mail
 
About illegal opcodes:"The common cause is by unintentionally executing from RAM or other data-areas." Did you mean stack overflow into relocatable code like flash routines ?Or anything else?
 
My question  "Did you mean stack overflow into relocatable code like flash routines ?Or anything else?" is invalid .Relocatable code like flash routines are intentional execution from RAM.
Now what could be the cause of unintentional execution from RAM. I  only see corruption of PC(Program Counter)Doesthat happen in real life scenario? Any other reasons?
 
Can freescalers comment on the possibility/probability of corruption of PC.
 
0 Kudos

1,163 Views
Lundin
Senior Contributor IV
Another reason could be a corrupt binary file downloaded into the MCU from a PC, or that the download itself faced EMI. Hopefully, both binary files and download tools use checksums to check the data, or in the latter case, a read-back of the data programmed.
0 Kudos

1,163 Views
kef
Specialist I
IMO watchdog just gives a little chance to really broken software. Watchdog can possible make this broken software a bit less harmful than it is. I mean if software hangs, then watchdog can make it going again. It's better than really dead thing, right?
Why software may hang? Loops that expect something always done in time, expecting some external (or even MCU internal) condition/signal getting OK sooner or later, such loops are suspect. Things may go wrong and it may start looping forever (hang). Timeouts should be handled.
 
Illegal opcode happens the most often not due to selferased bits of flash, but due to uninitialized pointers, due to stack overflow etc.
 
0 Kudos

1,163 Views
peg
Senior Contributor IV
Hi Geo,

These errors are most often generated by the code jumping to an incorrect location caused by an incorrectly calculated offset or similar. It may jump into data rather than code or it might jump into other code but not at the first byte of an opcode.

0 Kudos