MQX 4.2 websockets WS_send hanging due to task priorities

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

MQX 4.2 websockets WS_send hanging due to task priorities

4,259 次查看
lfschrickte
Contributor IV

Hi!

I'm not sure if this is a problem, but it costed me the whole afternoon - so I think I need to share the issue in order to possibly save somebody's time.

The WS_send function (and so the WS_close) have the following code right before returning:

   setsockopt(ws_context->sock, SOL_SOCKET, SO_EXCEPTION, &n_exept, sizeof(n_exept));

    /* Block calling task. It will be unblocked as soon as message is processed. */

    if ((_task_id) message.data != ws_context->tid)

    {

        _task_block();

    }

They're both taking advantage of the clever sync feature RTCS implemented in its last version I think - through the "except" condition.

However I found out there may be an issue with this approach in the current implementation. What if the HTTP server priority is higher than the priority of the WS_send caller task? What happens here in my test program is:

  1. My http server priority is 7 (the default)
  2. A user task of priority 10 calls WS_send
  3. Right after the setsockopt call (inside WS_send) the system function ws_process_api_calls (on httpsrv_ws.c ) takes the processor control (because the http server has higher priority!)
  4. It executes normally and then tries to unblock the caller - however the caller wasn't blocked yet! This puts the task in invalid state (0x011)!
  5. After that, the WS_send continues execution and call task_block - which blocks the task forever!

I could solve this issue here by changing the HTTP server priority to a large value - however I don't think this is the most 'elegant' solution.

Is there anything I'm missing or this is a bug?

Regards

标签 (1)
标记 (3)
7 回复数

3,442 次查看
Carlos_Musich
NXP Employee
NXP Employee

Hi Luiz,

We have not been able to reproduce this problem, MQX development team is asking for an application where we can see this behavior.

Would you share some project where we can test?

Thanks!

Carlos

0 项奖励
回复

3,442 次查看
lfschrickte
Contributor IV

Hi Martin,

It is still only working if the http server and websocket task priorities are the same. I don't have time to further investigate this now, but I'll do it as soon as I can - for now I'll keep the semaphore solution plus some task priorities tweaking!

Thanks for your help!

0 项奖励
回复

3,442 次查看
Carlos_Musich
NXP Employee
NXP Employee

Hi Luiz,

thank you for sharing your comments, we really appreciate it. This is not the way it should work. I will report this to MQX team.

Best regards,

Carlos

3,442 次查看
lfschrickte
Contributor IV

OK,

After this post I realized that changing my task priorities got me stuck in another place (this time HTTP_release is blocking when being called by a task with higher priority than the HTTP server priority). If you can please ask them also for some kind of workaround that maybe doesn't involve tuning task priorities (it looks like RTCS apps are highly sensitive to task priorities chanfes - looks like that making some change now may give me trouble in the future!).

Thank you! I appreciate your response!

Luiz

0 项奖励
回复

3,442 次查看
Martin_
NXP Employee
NXP Employee

Luiz,

instead of using _task_block() and _task_ready(_task_get_td((_task_id) message.data)), use lw semaphore object to synchronize the two tasks (_lwsem_wait() in WS_send() and _lwsem_post() in ws_process_api_calls()).

if the semaphore is posted by ws_process_api_calls(), the _lwsem_wait() in WS_send() won't block.

If the semaphore is not posted, the _lwsem_wait() in WS_send() will block and _lwsem_post() in ws_process_api_calls() will make it ready.

Martin

3,442 次查看
Martin_
NXP Employee
NXP Employee

Sounds like there an error in getsockopt() call?

        /* "Exception" on socket indicates call of user API. */

        if (RTCS_FD_ISSET(context->sock, &except_fd))

        {

            uint32_t n_except;

            uint32_t s_except;

            /* Read exception number to zero it. */

            getsockopt(context->sock, SOL_SOCKET, SO_EXCEPTION, (void *) &n_except, &s_except);

if by chance the "s_except" local variable is zero, the getsockopt() would return with error code, doing nothing. Zero option length is not allowed for getsockopt().

"s_except" should be set to sizeof(n_except) before getsockopt() call.

you can test getsockopt() return value at runtime to see if this is the case.

3,442 次查看
Carlos_Musich
NXP Employee
NXP Employee

Hi Luiz,

I submitted report number MQX-5670​. However MQX developers are focused on new releases and other activities so it will take some weeks for them to look at this issue.

I think a work around would be to monitor httpsrv task, and when it gets into invalid state to can use function below to set it in ready state.

_task_ready(_task_get_td(_task_get_id_from_name("task_name")));

where "task_name" in this case will be the httpsrv task name. I think it is the same "httpsrv".

Please let me know if this works.


Best regards,
Carlos

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 项奖励
回复