SOCK_STREAM_recv() bug drops data on receive timeout

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

SOCK_STREAM_recv() bug drops data on receive timeout

Jump to solution
1,329 Views
bowe
Contributor III

We are using MQX for KSDK 1.3.  We have found that the socket recv() function will drop some data if a receive timeout occurs.  We found this when trying to transferring a relatively large file (~500KB).  I believe this is the relevant code in SOCK_STREAM_recv():

  error = RTCSCMD_issue(parms, TCP_Process_receive);

  if (error) {

    RTCS_setsockerror(sock, error);

 

    /* Start CR 2340 */

    /* If data was copied to the userbuf, but not all that

       the recv() asked for, and a timer was started that has

       now timed out, we need to return with the count, and not

       RTCS_ERROR */

    if (error == RTCSERR_TCP_TIMED_OUT) {

       int n;

       _task_stop_preemption();

       n = parms.TCB_PTR->rcvnxt - parms.TCB_PTR->rcvbufseq;

       _task_start_preemption();

       RTCS_EXIT2(RECV, RTCS_OK, n);

    }

When a receive timeout occurs (detected by line 10), it is supposed to return how much data it partially filled the buffer with.  However, we are seeing it return 0, even though there is data in the buffer (we added a memset to zero out the buffer before calling the socket recv function).  I think that line 13 is the culprit (it seems to always set n to 0), but I am not sure what it should be to fix the issue, because I do not completely understand the TCB (Transmission Control Block) and how they are used.  I am guessing this line used to work with a previous version of RTCS, and was never updated?

Labels (1)
Tags (2)
1 Solution
940 Views
bowe
Contributor III

We believe we found a fix for this (at least it seems to work in our preliminary testing).  I think the issue is rcvbufseq has already been updated when we try to perform the calculation for the number of bytes, so we need to save its value before the RTCSCMD_issue().  Then since rcvbufseq should be updated, and rcvnxt might get updated before SOCK_STREAM_recv() gets to continue running, we actually want to look at the new value of rcvbufseq.  So the code shown in the previous post changes to this (again, that code is in SOCK_STREAM_recv() in sock_stream.c):

  uint32_t prev_recvbufseq = parms.TCB_PTR->rcvbufseq;

  error = RTCSCMD_issue(parms, TCP_Process_receive);

  if (error) {

    RTCS_setsockerror(sock, error);

 

    /* Start CR 2340 */

    /* If data was copied to the userbuf, but not all that

       the recv() asked for, and a timer was started that has

       now timed out, we need to return with the count, and not

       RTCS_ERROR */

    if (error == RTCSERR_TCP_TIMED_OUT) {

       int n;

       n = parms.TCB_PTR->rcvbufseq - prev_recvbufseq;

       RTCS_EXIT2(RECV, RTCS_OK, n);

    } 

Then another tweak to make httpsrv_read() in httpsrv_supp.c return when there is no data to be read (instead of locking up in a loop senslessly), we forced a return when the received number of bytes is 0 (change to line 07):

/* If there is some space remaining in user buffer try to read from socket */

while (read < len)

{

    uint32_t received;

    received = httpsrv_recv(session, dst+read, len-read, 0);

    if ((received != 0) && ((uint32_t)RTCS_ERROR != received))

    {

            read += received;

    }

    else

    {

        break;

    }

}

return(read);

View solution in original post

3 Replies
941 Views
bowe
Contributor III

We believe we found a fix for this (at least it seems to work in our preliminary testing).  I think the issue is rcvbufseq has already been updated when we try to perform the calculation for the number of bytes, so we need to save its value before the RTCSCMD_issue().  Then since rcvbufseq should be updated, and rcvnxt might get updated before SOCK_STREAM_recv() gets to continue running, we actually want to look at the new value of rcvbufseq.  So the code shown in the previous post changes to this (again, that code is in SOCK_STREAM_recv() in sock_stream.c):

  uint32_t prev_recvbufseq = parms.TCB_PTR->rcvbufseq;

  error = RTCSCMD_issue(parms, TCP_Process_receive);

  if (error) {

    RTCS_setsockerror(sock, error);

 

    /* Start CR 2340 */

    /* If data was copied to the userbuf, but not all that

       the recv() asked for, and a timer was started that has

       now timed out, we need to return with the count, and not

       RTCS_ERROR */

    if (error == RTCSERR_TCP_TIMED_OUT) {

       int n;

       n = parms.TCB_PTR->rcvbufseq - prev_recvbufseq;

       RTCS_EXIT2(RECV, RTCS_OK, n);

    } 

Then another tweak to make httpsrv_read() in httpsrv_supp.c return when there is no data to be read (instead of locking up in a loop senslessly), we forced a return when the received number of bytes is 0 (change to line 07):

/* If there is some space remaining in user buffer try to read from socket */

while (read < len)

{

    uint32_t received;

    received = httpsrv_recv(session, dst+read, len-read, 0);

    if ((received != 0) && ((uint32_t)RTCS_ERROR != received))

    {

            read += received;

    }

    else

    {

        break;

    }

}

return(read);

940 Views
Carlos_Musich
NXP Employee
NXP Employee

Hi Bowe,

your workaround seems fine. Thank you so much for sharing it.

I need to report this issue to MQX development team. I will let you know when they have a final fix.

Regards,

Carlos

0 Kudos
Reply
940 Views
Carlos_Musich
NXP Employee
NXP Employee

Just FYI,

the report number for this issue is MQX-5683.

Regards,

Carlos

0 Kudos
Reply