Occasionally_lwmsgq_receive Never Returns

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Occasionally_lwmsgq_receive Never Returns

Jump to solution
960 Views
leifzars
Contributor IV

I am testing my application and all is well for the first hour or so. But it seems that something fowls up the scheduler so that on of my tasks never executes again.

I am stuck and would really appreciate some help.

The line that never returns is: if (_lwmsgq_receive(SL_Q_PTR, (void*) &msg, LWMSGQ_RECEIVE_BLOCK_ON_EMPTY, 100, 0) == MQX_OK) {

It functions fine for a few hundred if not thousand calls.

I found some odd details in the MQX debugging tools. All of my other tasks that should be blocking on as similar _lwmsgq_receive call show a State of 'LW MSg RX Blocked, timeout' the offending task shows a State of 'Time delay blocked'. I also show 16 messages in the offending tasks blocking lwm queue, so it should be executing.

Another oddity is that the kernel time is 59902 seconds, functioning tasks have a 'Task Status'  'Timeout' of around 59595, while the problem task has a timeout of 31341, not sure if that knowledge is worth anything.

All elements of the MQX appear to be valid, I ran the MQX 'Check for Errors utility', no problems found.

I have the Kernel data structure but am not sure how to inspect it for the error. How do you think I should proceed debugging this?

Thanks

Leif

Tags (2)
0 Kudos
1 Solution
593 Views
leifzars
Contributor IV

I believe I found the problem, I mistakenly had an interrupt set to a higher priority then appropriate compared to MQX recommended setting.

View solution in original post

0 Kudos
5 Replies
594 Views
leifzars
Contributor IV

I believe I found the problem, I mistakenly had an interrupt set to a higher priority then appropriate compared to MQX recommended setting.

0 Kudos
593 Views
justanotheruser
Contributor I

Hello Leif!

This post has saved my week!

I was having the same problem and feared to spend the whole week looking for the solution.

Thanks!

0 Kudos
593 Views
soledad
NXP Employee
NXP Employee

Hello Leif Zars,

Thanks for sharing the solution to this problem

Have a nice day!!

Sol

0 Kudos
593 Views
leifzars
Contributor IV

I added some QA code to inspect the TIMEOUT_QUEUE at a few key points in the MQX library. I was hoping that I would be able to determine the exact point at which the structure was being corrupted. I wasn't so lucky, but I thought I would share the info I got.

As you can see in the call stack for the last thread, my 'testQStruct' function found the TIMEOUT_QUEUE struct corrupted in the `_time_notify_kernel` function. I included a pic of the block of code (_time_notify_kernel) that I first notice the issue.


I don't believe the error is occurring in the `_time_notify_kernel`.

TQ_Failed_Stack.jpg

time_notify_kernel.jpg

But I don't get why when I review the memory the struct looks fine. ie My algorithm shows that the struct's actual size does not match its reported size, yet when I manually view it it is fine.

trace.jpg

void testQStruct(QUEUE_STRUCT* q){

  QUEUE_ELEMENT_STRUCT_PTR elementPtr;

  int ctN = 0;

  int ctP = 0;

  elementPtr = q->NEXT;

  while((void*)elementPtr != (void*)q){

       ctN++;

       elementPtr = elementPtr->NEXT;

  }

  if(q->SIZE != ctN)

       __asm("BKPT #0\n");

  elementPtr = q->PREV;

  while((void*)elementPtr != (void*)q){

       ctP++;

       elementPtr = elementPtr->PREV;

  }

  if(q->SIZE != ctP)

       __asm("BKPT #0\n");

}

0 Kudos
593 Views
leifzars
Contributor IV

I am looking in the _mqx_kernel_data struct and at the TIMEOUT_QUEUE member. It shows a size of 1 but the queue contains 4 unique elements. So it seems that this queue might be corrupted ?

All but the dead task are referenced in this Q, even though its size == 1

MQX_TimeOutQ Mem.jpg

0 Kudos