MQX 4.2 websockets WS_send hanging due to task priorities

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

MQX 4.2 websockets WS_send hanging due to task priorities

2,243 Views
lfschrickte
Contributor IV

Hi!

I'm not sure if this is a problem, but it costed me the whole afternoon - so I think I need to share the issue in order to possibly save somebody's time.

The WS_send function (and so the WS_close) have the following code right before returning:

   setsockopt(ws_context->sock, SOL_SOCKET, SO_EXCEPTION, &n_exept, sizeof(n_exept));

    /* Block calling task. It will be unblocked as soon as message is processed. */

    if ((_task_id) message.data != ws_context->tid)

    {

        _task_block();

    }

They're both taking advantage of the clever sync feature RTCS implemented in its last version I think - through the "except" condition.

However I found out there may be an issue with this approach in the current implementation. What if the HTTP server priority is higher than the priority of the WS_send caller task? What happens here in my test program is:

  1. My http server priority is 7 (the default)
  2. A user task of priority 10 calls WS_send
  3. Right after the setsockopt call (inside WS_send) the system function ws_process_api_calls (on httpsrv_ws.c ) takes the processor control (because the http server has higher priority!)
  4. It executes normally and then tries to unblock the caller - however the caller wasn't blocked yet! This puts the task in invalid state (0x011)!
  5. After that, the WS_send continues execution and call task_block - which blocks the task forever!

I could solve this issue here by changing the HTTP server priority to a large value - however I don't think this is the most 'elegant' solution.

Is there anything I'm missing or this is a bug?

Regards

Labels (1)
Tags (3)
7 Replies

1,426 Views
Carlos_Musich
NXP Employee
NXP Employee

Hi Luiz,

We have not been able to reproduce this problem, MQX development team is asking for an application where we can see this behavior.

Would you share some project where we can test?

Thanks!

Carlos

0 Kudos

1,426 Views
lfschrickte
Contributor IV

Hi Martin,

It is still only working if the http server and websocket task priorities are the same. I don't have time to further investigate this now, but I'll do it as soon as I can - for now I'll keep the semaphore solution plus some task priorities tweaking!

Thanks for your help!

0 Kudos

1,426 Views
Carlos_Musich
NXP Employee
NXP Employee

Hi Luiz,

thank you for sharing your comments, we really appreciate it. This is not the way it should work. I will report this to MQX team.

Best regards,

Carlos

1,426 Views
lfschrickte
Contributor IV

OK,

After this post I realized that changing my task priorities got me stuck in another place (this time HTTP_release is blocking when being called by a task with higher priority than the HTTP server priority). If you can please ask them also for some kind of workaround that maybe doesn't involve tuning task priorities (it looks like RTCS apps are highly sensitive to task priorities chanfes - looks like that making some change now may give me trouble in the future!).

Thank you! I appreciate your response!

Luiz

0 Kudos

1,426 Views
Martin_
NXP Employee
NXP Employee

Luiz,

instead of using _task_block() and _task_ready(_task_get_td((_task_id) message.data)), use lw semaphore object to synchronize the two tasks (_lwsem_wait() in WS_send() and _lwsem_post() in ws_process_api_calls()).

if the semaphore is posted by ws_process_api_calls(), the _lwsem_wait() in WS_send() won't block.

If the semaphore is not posted, the _lwsem_wait() in WS_send() will block and _lwsem_post() in ws_process_api_calls() will make it ready.

Martin

1,426 Views
Martin_
NXP Employee
NXP Employee

Sounds like there an error in getsockopt() call?

        /* "Exception" on socket indicates call of user API. */

        if (RTCS_FD_ISSET(context->sock, &except_fd))

        {

            uint32_t n_except;

            uint32_t s_except;

            /* Read exception number to zero it. */

            getsockopt(context->sock, SOL_SOCKET, SO_EXCEPTION, (void *) &n_except, &s_except);

if by chance the "s_except" local variable is zero, the getsockopt() would return with error code, doing nothing. Zero option length is not allowed for getsockopt().

"s_except" should be set to sizeof(n_except) before getsockopt() call.

you can test getsockopt() return value at runtime to see if this is the case.

1,426 Views
Carlos_Musich
NXP Employee
NXP Employee

Hi Luiz,

I submitted report number MQX-5670​. However MQX developers are focused on new releases and other activities so it will take some weeks for them to look at this issue.

I think a work around would be to monitor httpsrv task, and when it gets into invalid state to can use function below to set it in ready state.

_task_ready(_task_get_td(_task_get_id_from_name("task_name")));

where "task_name" in this case will be the httpsrv task name. I think it is the same "httpsrv".

Please let me know if this works.


Best regards,
Carlos

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos