Hi. I have a weird issue going on with _lwsem_create(). Depending on how my system is configured it can crash halfway through creating an lwsem. Tracking the code through it dies in lwsenm.c in _lwsem_create_internal() around here:
{
sem_chk_ptr = (LWSEM_STRUCT_PTR) ((pointer) kernel_data->LWSEM.NEXT);
while (sem_chk_ptr != (LWSEM_STRUCT_PTR) ((pointer) &kernel_data->LWSEM))
{
if (sem_chk_ptr == sem_ptr)
{
_int_enable();
_KLOGX2(KLOG_lwsem_create, MQX_EINVAL);
return (MQX_EINVAL);
}
sem_chk_ptr = (LWSEM_STRUCT_PTR) ((pointer) sem_chk_ptr->NEXT);
}
}
It looks to me like it's looping round a linked list of some description. When it crashes, the NEXT elements go out of range and MQX then sits in a tight loop firing off an unhandled interrupt, which I trap. What I've discovered so far is that when the create works, sem_ptr (the address of the LWSEM_STRUCT you're trying to create) is an address ABOVE the first PREV/NEXT address whereas when it fails it's BELOW meaning as it loops up it never gets a match. The use of a different number of calls to _mem_alloc before the problem seems to have a marked effect. All our code looks good and it all runs OK, it's just certain combinations that seem to case the issue. Processor is an MPC568G ported from the TWRPXN2020 target in MQX4.0. Anybody had similar issues? It may actually be a CodeWarrior issue but I'm stuck!
Solved! Go to Solution.
If anybody is interested it turned out to be a 1 byte overrun in an array index. Always use a define! We'd changed an element size but left a temporary array with a hard coded size so when we changed the size it got missed. Struggling to understand how it corrupted kernel_data and/or the semaphore pointers like it did but it doid. I'd have expected a corrupt value or a bit of oddness, not the full on run away that we saw
If anybody is interested it turned out to be a 1 byte overrun in an array index. Always use a define! We'd changed an element size but left a temporary array with a hard coded size so when we changed the size it got missed. Struggling to understand how it corrupted kernel_data and/or the semaphore pointers like it did but it doid. I'd have expected a corrupt value or a bit of oddness, not the full on run away that we saw