Problem debugging hardfault LPC11C14

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Problem debugging hardfault LPC11C14

1,599 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by DaveNadler on Mon Jul 11 07:04:14 MST 2011
[FONT=Arial]Hi - I have a SIGSTOP hard fault that occurs after processing some number
of CAN interrupts happily. I tried to use the instructions here:
[/FONT]http://support.code-red-tech.com/CodeRedWiki/DebugHardFault

[FONT=Arial]It reads:

Quote:
If you are getting a hard fault when executing your code under
the debugger, then you can tell where in your sources this is being
caused by looking at the VECTPC pseudo-register that is displayed in
the Core Register view when the hard fault is trapped.
If you hover your mouse over VECTPC, it will do a lookup on the
address and the tooltip will show you where in your code that is.

Unfortunately, VECTPC is not listed in the Core Register view.

I added the fault handler code suggested in the above link.
The debugger will not single-step through the assembly portion.
Trying to run the C handler faults again.
On arriving at the C handler, trying to single-step, it faults again
and pops a dialog:
[/FONT]
Quote:
Target reported errors:
15: Target error from Set break/watch
HW execution break may only be set below 0x20000000.

[FONT=Arial]

Update: The stack pointer in the screenshot below is clearly clobbered.
Breakpointing the fault handler, then changing the stack pointer to
something that actually points to RAM at the break, allows the fault
handler to execute, however:
- all SFRs read in the C-handler are zero (_CFSR,_HFSR,_DFSR,_AFSR,_MMAR,_BFAR)
- I can't find any useful info on where the fault was triggered

Can anyone tell me how to find out *where* the fault occurred ?
Thanks in advance !
Best Regards, Dave

PS: LPClink, LPCXpresso v3.6.3 [Build 317] [08/04/2011]

[IMG]http://www.nadler.com/backups/SIGSTOP_screenshot.png[/IMG]
[/FONT]
0 Kudos
11 Replies

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by DaveNadler on Wed Jul 20 09:03:31 MST 2011
[FONT=Arial]Hi All - I've posted the corrected and cleaned up FreeRTOS v7.0.0 port for Cortex-M0 here:
http://www.freertos.org/
Look under:  [I]Supported Devices & Demos >> Contributed Ports >> NXP
[/I]
This version works for me under stress (this doesn't crash like the 2010 NXP version on the NXP web site and distributed by CodeRed as an example).

Just replace the file port.c and you're good to go.

NXP folks - It would be great if whomever did the M0 adaptation could review this, and if you could update the version on your web site.

CodeRed folks - It would be great if you could update your example in the next distribution.

Hope this is helpful !
Best Regards, Dave

[/FONT]
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by DaveNadler on Tue Jul 12 09:25:11 MST 2011
[FONT=Arial]OK, now that I've processed a couple million CAN msgs without incident,
I'm reasonably confident I fixed the "portable" (ie, the platform-specific
non-portable code) component of FreeRTOS to work properly on M0.
Details and repaired code here:
[/FONT]https://sourceforge.net/projects/freertos/forums/forum/382005/topic/4604276[FONT=Arial]

The [/FONT][FONT=Arial]unofficial [/FONT][FONT=Arial]FreeRTOS version from NXP (which is redistributed in the
CodeRed examples) has some problems...

Thanks all for the suggestions and hope the above repairs are helpful,
Best Regards, Dave

[/FONT]
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by Rob65 on Tue Jul 12 08:18:16 MST 2011

Quote: DaveNadler

[Semihosting output] Should work OK as long as use is restricted to one task with adequate stack space (though it is slow)...


The problem is that interrupts may be missed or, as I have seen in one of my applications, that debugging takes so much time that no 'normal' code  is being executed since the next interrupt is already waiting before semi hosting output completes.


Quote:

These ports all maximum priority (0 numerically). Thus it should not be possible to get this wrong ?

Richard states that you should not use a high priority for the interrupts.
At first I had completely forgotten about priorities.
Now I use the following:
    NVIC_SetPriority(UART1_IRQn, 10);
    NVIC_EnableIRQ(UART1_IRQn);
and now my code works perfectly well (for weeks without problems).


Quote:

I'm using configUSE_PREEMPTION; unsure why you expect this could be a problem ? Can you provide more detail ?

Preemption adds a next level of difficulty to your code.
reading your post it looks like you have some idea of what you are doing so then there should be no problem. Whenever I have strange problems that I am unable to trace I set configUSE_PREEMPTION to 0 - this is surely handy while debugging.

Quote:

The NXP port posted "FreeRTOS Example Project for LPC1114/301 V1.0 (Jul 14, 2010)" uses MSP rather than PSP and has at least one stack setup bug (port.c file comments indicate it is from FreeRTOS 5.3).

Using the MSP seems a bad idea. There was a reason why ARM has a PSP :eek:
This (Cortex M0) port does not look to be something that Richard provides with FreeRTOS - at least it is not a part of the official distribution from Real Time Engineers Ltd and their code always contain the same version numbers through out the code.
It is a good thing you are looking at this. I just glanced at it and there surely is some strange code in there. It might be a good idea to grab a new M3 port and work your way down from there.


Quote:

I have not located any distributed M0 version that uses PSP.

There is no distributed M0 port. At least not from Real Time Engineers Ltd, not from their website.
The person who created this version should make sure that at least his/her name (or company name) is in their stating that this software is not provided by Real Time Engineers Ltd but created by him/her.

Regards,

Rob

P.s: in xPortPendSVHandler they are using ldm but no stm - is there no stm available in the M0 ???
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by DaveNadler on Tue Jul 12 06:09:37 MST 2011
[FONT=Arial]Rob, Thanks for the very helpful and detailed reply !
I think I have fixed the bug in the published port.c
which led to stack corruption, more later...

Comments on your excellent tips below...
[/FONT]

Quote: Rob65
Dave,
which version of FreeRTOS are you using?
The FreeRTOS port is labeled as "Cortex M3 port"  and the lpc11c14 is a Cortex M0 processor. I have not yet looked into possible differences between the two or if there are any changes to be made to the Cortex M3 port.


Yup, the M3 port uses instructions not available on M0...


Quote: Rob65

As other suggest, the problems you see may be related to stack problems (are you sure you have enough stack space?) or other memory problems.
You should set configCHECK_FOR_STACK_OVERFLOW  and configUSE_MALLOC_FAILED_HOOK to check for stack overflow and memory alloc problems.


Already done...


Quote: Rob65

Also, do not use semi hosting printf since that has a major impact on system performance. I have had lots of problems with semi hosting printf in my FreeRTOS applications.


Should work OK as long as use is restricted to one task with adequate stack space (though it is slow)...


Quote: Rob65

Some more tips:

[LIST]
[*]Check the priority of interrupts in FreeRTOS.
Just this week I had a problem with an interrupt from the RTC having the wrong priority - all seems to work for some time and then the system crashes or hangs with strange problems.
[/LIST]


These ports all maximum priority (0 numerically). Thus it should not be possible to get this wrong ?


Quote: Rob65


[LIST]
[*]Make sure that a task (function) does never return. All tasks you have should end with a " while(1) ; ".
If a task returns then the system will halt with a hard fault.
[/LIST]


Yup.


Quote: Rob65


[LIST]
[*]Set configUSE_PREEMPTION to 0 to prevent any I/O sequences being disrupted by a taskswitch.
[/LIST]


I'm using configUSE_PREEMPTION; unsure why you expect this could be a problem ? Can you provide more detail ?


Quote: Rob65


[LIST]
[*]Remove --gc-sections from the miscellaneous linker options.
I have seen that the linker produces wrong debug info in some cases (especially within tasks.c) resulting in mismatch between the real address and where the debugger thinks your code is: stepping through the code showed me that different lines of code were being executed than being shown by the debugger.
[/LIST]


Not having this problem here, thankfully.


Quote: Rob65


[LIST]
[*]VECTPC is a pseudo register that will only appear in the register list when it has meaning. Not all faults result in a valid VECTPC content so you will not always see VECTPC in the list.
[/LIST]


Thanks for clarifying. I need to study the ARM book exception explanation (the NXP extract is too sparse to be helpful here).

Quote: Rob65


[LIST]
[*]Read http://support.code-red-tech.com/CodeRedWiki/DebugHardFault. There is some more information on how to discover where the problem is.
[/LIST]


Studied in detail and referenced below...

Quote: Rob65

I have seen similar problems as you have with FreeRTOS.
I also discovered that in all cases this was due to an error in my program (overwriting the stack, priority problems, interrupt overruns, calling functions when not allowed, ...).
FreeRTOS is working as expected, the Cortex M3 port is working great and I did not have to make any changes to the port to get things working.

The FreeRTOS port for the Cortex M3 is using the PSP so I have no clue as to where you got your FreeRTOS port. The FreeRTOS ports provided by NXP (through the support section of the ics.nxp.com website) and the port that comes with the new LPCXpresso tools from Code Red both do use the PSP.
So I suggest to grab on of the ports that are known to work.


The NXP port posted "FreeRTOS Example Project for LPC1114/301 V1.0 (Jul 14, 2010)" uses MSP rather than PSP and has at least one stack setup bug (port.c file comments indicate it is from FreeRTOS 5.3). The FreeRTOS demo port.c file distributed with the latest Expresso is a copy of the above.

I have not located any distributed M0 version that uses PSP.


Quote: Rob65

Go check whether there need to be any changes made to have it working on the Cortex M0 and start with the simple examples like the flashing LED before starting to create your own application. Pay special attention to interrupt handling.


Already done...


Quote: Rob65

Buy and read the books that Richard wrote about FreeRTOS ("Using the FreeRTOS kernel" and "FreeRTOS User Manual"). These books contain valuable information.


Done, and they are excellent, as is FreeRTOS and Richard's support !


Quote: Rob65

I have been at the exact same point where you are right now. I was ready to burn all my FreeRTOS work and select a different RTOS.
But then I decided to start reading and start all over again adding my own tasks a small section each time. That is how I discovered all my faults.

But first of all stop using semihosting with FreeRTOS.
First of all semihosting will drastically decrease your system's performance and secondly I am almost sure that the printf is not thread safe.
Also make sure that any I/O or driver functions that are being called from tasks have a mutex to guard them.


Again, the semihosting output should be OK if limited to one task.
Certainly debug_printf is not task-safe (verified when I was clumsy with that ;-)
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by DaveNadler on Tue Jul 12 03:06:58 MST 2011

Quote: Zero
As I understand you are using CAN ROM drivers :)
Leads me to the simple question if you've reserved RAM for them :confused:


[FONT=Arial]
Yup.[/FONT]
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by Ex-Zero on Tue Jul 12 02:35:02 MST 2011
As I understand you are using CAN ROM drivers :)

Leads me to the simple question if you've reserved RAM for them :confused:
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by Rob65 on Tue Jul 12 01:41:01 MST 2011
Dave,

which version of FreeRTOS are you using?
The FreeRTOS port is labeled as "Cortex M3 port"  and the lpc11c14 is a Cortex M0 processor. I have not yet looked into possible differences between the two or if there are any changes to be made to the Cortex M3 port.

As other suggest, the problems you see may be related to stack problems (are you sure you have enough stack space?) or other memory problems.
You should set configCHECK_FOR_STACK_OVERFLOW  and configUSE_MALLOC_FAILED_HOOK to check for stack overflow and memory alloc problems.

Also, do not use semi hosting printf since that has a major impact on system performance. I have had lots of problems with semi hosting printf in my FreeRTOS applications.

Some more tips:

[LIST]
[*]Check the priority of interrupts in FreeRTOS.
Just this week I had a problem with an interrupt from the RTC having the wrong priority - all seems to work for some time and then the system crashes or hangs with strange problems.
[*]Make sure that a task (function) does never return. All tasks you have should end with a " while(1) ; ".
If a task returns then the system will halt with a hard fault.
[*]Set configUSE_PREEMPTION to 0 to prevent any I/O sequences being disrupted by a taskswitch.
[*]Remove --gc-sections from the miscellaneous linker options.
I have seen that the linker produces wrong debug info in some cases (especially within tasks.c) resulting in mismatch between the real address and where the debugger thinks your code is: stepping through the code showed me that different lines of code were being executed than being shown by the debugger.
[*]VECTPC is a pseudo register that will only appear in the register list when it has meaning. Not all faults result in a valid VECTPC content so you will not always see VECTPC in the list.
[*]Read http://support.code-red-tech.com/CodeRedWiki/DebugHardFault. There is some more information on how to discover where the problem is.
[/LIST]
I have seen similar problems as you have with FreeRTOS.
I also discovered that in all cases this was due to an error in my program (overwriting the stack, priority problems, interrupt overruns, calling functions when not allowed, ...).
FreeRTOS is working as expected, the Cortex M3 port is working great and I did not have to make any changes to the port to get things working.
The FreeRTOS port for the Cortex M3 is using the PSP so I have no clue as to where you got your FreeRTOS port. The FreeRTOS ports provided by NXP (through the support section of the ics.nxp.com website) and the port that comes with the new LPCXpresso tools from Code Red both do use the PSP.

So I suggest to grab on of the ports that are known to work.
Go check whether there need to be any changes made to have it working on the Cortex M0 and start with the simple examples like the flashing LED before starting to create your own application.
Pay special attention to interrupt handling.
Buy and read the books that Richard wrote about FreeRTOS ("Using the FreeRTOS kernel" and "FreeRTOS User Manual"). These books contain aluable information.

I have been at the exact same point where you are right now. I was ready to burn all my FreeRTOS work and select a different RTOS.
But then I decided to start reading and start all over again adding my own tasks a small section each time. That is how I discovered all my faults.

But first of all stop using semihosting with FreeRTOS.
First of all semihosting will drastically decrease your system's performance and secondly I am almost sure that the printf is not thread safe.
Also make sure that any I/O or driver functions that are being called from tasks have a mutex to guard them.

Regards,

Rob
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by TheFallGuy on Mon Jul 11 23:55:46 MST 2011
I suggest that you read the ARM ARM (Architecture reference manual - available from the ARM website) to learn how to decode Faults. An altenative would be to read Joseph Yiu's excellent book "The Definitive Guide to the ARM Cortex-m3".

Also, it looks like you are using semihosting to printf() to the IDE console. Did you read the "Important notes on semihosting" on this page?
http://support.code-red-tech.com/CodeRedWiki/UsingPrintf
It is possible that this could be your problem?
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by DaveNadler on Mon Jul 11 18:00:28 MST 2011
[FONT=Arial]It is certainly likely that the stack is somehow getting clobbered,
causing the fault. So...

There is plenty of stack space for all tasks; unlikely an overrun.

Regarding re-entrancy:
(1) I'm using essentially no CodeRed library functions.
(2) Calls to the CAN ROM routines are interrupt-guarded as I assume
they are not re-entrant and they are used in the CAN ISR
(sure would be nice if NXP documented this, along with required
stack space, required scope of argument structures, proper use
of const on function prototypes, etc, etc).
... I certainly hope CodeRed generated code is not inherently
non-reentrant; that would be pretty unfortunate for an embedded
product !

I'm continuing to debug (yes, pruning and binary search)...

Meanwhile:
[/FONT][FONT=Arial][B]How come I cannot find out the PC of where the problem manifested ?
ie, where was bogus PC loaded ?
Why is VECTPC acting flakey in the IDE ?
 
[/B] Thanks in advance for any and all help,
Best Regards, Dave[/FONT]
[FONT=Arial]
[/FONT]
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by NXP_USA on Mon Jul 11 16:18:13 MST 2011
Maybe the stack clobbering is the cause of the hard fault. If the stack is clobbered, then when the current function returns a bad value will get loaded into the SP register from the stack. Stack problems can be notoriously difficult to troubleshoot but are either caused by a buffer overrun of something on the stack, bad pointer math to something on the stack, or just plain running out of stack space and having the globals overwrite the stack.

Probably the best bet is to comment out sections of code until you find out something that prevents the problem from happening, or to add a lot of logging so you can see what operations are happening before the crash. For example it looks like in your system that you send and receive messages. You might want to try writing a simulator thread that generates message data and replacing the actual communications thread with the simulator thread to see if that impacts the problem. Also try replacing the function that processes messages with a function that just accepts data without processing it. Try to narrow it down to specific types of messages.

Finally, it looks like you are using FreeRTOS and semihosting. I am not certain that the RedLib stdio + semihosting is re-entrant (Code Red?) If not make sure you protect those functions and only call them from one thread which disables task switching.
0 Kudos

1,064 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by DaveNadler on Mon Jul 11 15:39:20 MST 2011
[FONT=Arial]An update....

First, after some digging, I found that the FreeRTOS port
I'm using did not use PSP (process stack pointer) for tasks,
so the exception handler (which always uses PSP) was not
guaranteed a safe stack as both were using MSP (main stack).
I updated FreeRTOS to use PSP (actually, I made it a
configuration option). That was annoying.

Now, if I deliberately trigger an exception, the handler operates
as expected, and the IDE shows [/FONT][FONT=Arial]VECTPC in the Core Register
view as expected. Yea !

Unfortunately, my application now crashes infrequently.
I've pushed the CAN msg rate up to 200/second and it
takes a few minutes to die (it used to die fairly promptly).
Unfortunately, in this case:
[B]VECTPC is not shown in the Core Register pane.
How can I find [U]where[/U] the exception was caused ?
Trying this repeatedly, one time I got
VECTPC displaying 0x616c6974 - which is neither flash nor RAM.
[/B] Still hunting for the cause...

As always,
Thanks in advance for any tips,
Best Regards, Dave
[/FONT][FONT=Arial]
[/FONT]
0 Kudos