Debugging Hard Fault

lpcware · ‎06-15-2016

Content originally posted in LPCWare by martinmckee on Wed Dec 26 23:35:30 MST 2012
Okay, full disclosure, I'm quite new to 32-bit microcontrollers in general ( used to AVR 8-bit ) and to ARM Cortex-M0 in particular. One thing that means is that I have not had to deal with exceptions ( or faults ) before.

The issue I'm having is that I have some code that implements a cooperative task handling API. Everything was going fine until I added another, more complicated, version of a function. The two earlier versions I implemented worked just fine. Calling the third seems to trigger a Hard Fault. But, here's the bit I don't understand. The Hard Fault is triggered within a function that I am calling that does the bulk of the work. This function is identical for all three implementations, it simply takes different parameters ( created by the calling functions ).

The struct being operated upon is declared as:

typedef struct COOP_TASK {
//
// Control Members
//
bool enabled;
coop_task_priority_t priority;
uint8_t id;

//
// Event Members
//
coop_event_function_t event;
void * event_data;
coop_size_t event_data_size;

//
// Task Implementation Members
//
coop_task_function_t func;
void * task_data;
coop_size_t task_data_size;

//
// Task List
//
struct COOP_TASK * next_task;
} coop_task_t;

And the code that triggers the Fault looks like this:

task->enabled = true;
task->priority = _priority;
task->id = tasker_system.next_id;
task->event_data = _event_data; // This line triggers it...
task->event = _event;
task->func = _func;
task->task_data = _task_data;
task->next_task = 0;

The task memory is allocated by a pool manager which seems to be working just fine. When I step through and trap the instruction that actually causes the fault it ends up being:
str r2, [ r3, #8 ]
Where r3 correctly holds the base address of the structure( 0x100002b ) and the computed offset access address is ( 0x1000033 ). Obviously there is an unaligned access happening, but the question is, why? This is all code generated by the compiler that is acting up, and the other two calls to the function are working just fine ( alone or with this last version of the call ). For what it's worth, this all happens the same way ( with the same addresses ) with or without optimization ( O0, O2, or Os ).

I'm out of ideas.

Martin Jay McKee

lpcware · ‎06-15-2016

Content originally posted in LPCWare by martinmckee on Thu Dec 27 11:51:12 MST 2012
Spot on. I was actually planning to make sure that the manager always returned aligned structs ( I had some of the framework in place in fact ) but everything had been working so well... Anyway, I did the last little bit of work to ensure aligned allocations and everything is working just as it should.

I got side tracked by the fact that it had been working correctly before ( stupid really ). The addition of the new version of the call caused more data to be allocated which, of course, shifted things to a non-aligned boundary. Glad it's working again -- and glad that I've managed to learn ( or, rather, be reminded of ) something useful... pay attention to alignment!

Martin Jay McKee

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wmues on Thu Dec 27 09:18:23 MST 2012
Your pool manager should only allocate task structs that are aligned at int boundaries.
If you post the code of the pool manager, we might be able to help you.

Debugging Hard Fault

Debugging Hard Fault

LPC11xx