MQX 4.2 RTCS: TCP accept stops receiving connections

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

MQX 4.2 RTCS: TCP accept stops receiving connections

Jump to solution
1,374 Views
lfschrickte
Contributor IV

Hi,

I'm having a very "rare" bug in my product: suddenly it stops accepting connections in a TCP port. It happens after like 1 week or 2 with the device running, but not always!

I know it is really difficult to reach a conclusion, but I'm posting this here hoping that some comment may be useful.

My device is running MQX 4.2 and I'm using the backlog parameter on the listen call (= 4). After the listen call I keep accepting connections regularly, but then suddenly no more connections arrive - and no errors return from the accept call (it blocks forever). I've no clue on where to start looking - the code that hangs is below - it works like 99.99% of the time. All other device functions keep working normally - this includes a http server with lots of cgi calls!

Shortened code

#define SOCKRX_RBSIZE512/* tcp2ras socket rx buffer size */
#define SOCKRX_TBSIZE128/* tcp2ras socket tx buffer size */

#define SOCKTX_RBSIZE32/* ras2tcp socket rx buffer size */
#define SOCKTX_TBSIZE512/* ras2tcp socket tx buffer size */

#define WSSIAM_MAX_INCOME_CONN4 /* maximum incoming connections we accept - do not work on MQX 4.1.* */

void wssiam_tcp2ras_task(uint32_t args)

{

  sockaddr_in local_addr;

   uint32_t sock, listensock;

   uint32_t error;

   uint16_t rlen;

   uint32_t sockrx_rbsize = SOCKRX_RBSIZE;

   uint32_t sockrx_tbsize = SOCKRX_TBSIZE;

   for (;;) {

   /* create socket */

  sock = socket(PF_INET, SOCK_STREAM, 0);

   /* reduce buffer sizes for socket, as WSSIAM packets are always < 512 bytes */

  setsockopt(sock, SOL_TCP, OPT_RBSIZE, &sockrx_rbsize, sizeof(sockrx_rbsize));

  setsockopt(sock, SOL_TCP, OPT_TBSIZE, &sockrx_tbsize, sizeof(sockrx_tbsize));

   /* set listening port parameters */

  local_addr.sin_family = AF_INET;

  local_addr.sin_port = dev_config.ws_port;

  local_addr.sin_addr.s_addr = INADDR_ANY;

   /* bind socket to local address */

  error = bind(sock, &local_addr, sizeof(local_addr));

   if (error != RTCS_OK) {

  shutdown(sock, FLAG_ABORT_CONNECTION);

  DBG_ERROR("Failed to bind the stream socket");

   continue;

  }

   /* Set up the stream socket to listen on the TCP port: */

  error = listen(sock, WSSIAM_MAX_INCOME_CONN);  /* backlog only works on MQX 4.2+ */

   if (error != RTCS_OK) {

  shutdown(sock, FLAG_ABORT_CONNECTION);

  DBG_ERROR("listen() failed");

  _time_delay(1000);

   continue;

  }

  listensock = sock;

   for (;;) {

   int r;

  sockaddr_in remote_addr;

   char recv_buf[SOCKRX_RBSIZE];

   uint32_t socktimeout;

   /* this is mandatory, otherwise accept will not copy data to remote_addr */

  rlen = sizeof(remote_addr);

   /* accept any income connection */

  sock = accept(listensock, &remote_addr, &rlen);

   if (sock == RTCS_SOCKET_ERROR) {

  shutdown(sock, FLAG_ABORT_CONNECTION);

  DBG_ERROR("accept() failed");

  continue;

  }

  socktimeout = SOCK_TIMEOUT;

   /* timeout for context keeping after shutdown */

  setsockopt(sock, SOL_TCP, OPT_TIMEWAIT_TIMEOUT, &socktimeout, sizeof(socktimeout));

   /* timeout for send - minimum value */

  socktimeout = SEND_TIMEOUT;

  setsockopt(sock, SOL_TCP, OPT_SEND_TIMEOUT, &socktimeout, sizeof(socktimeout));

   /* reduce buffer sizes for socket, as WSSIAM packets are always < 512 bytes */

  setsockopt(sock, SOL_TCP, OPT_RBSIZE, &sockrx_rbsize, sizeof(sockrx_rbsize));

  setsockopt(sock, SOL_TCP, OPT_TBSIZE, &sockrx_tbsize, sizeof(sockrx_tbsize));

   /* wait for data on socket */

  r = recv(sock, recv_buf, sizeof(recv_buf), 0);

/* parse data */

...

}

}

I could attach the debugger once to the running target when the error occurred and the task was MSG RX blocked (as it always does when waiting for connections). There was no stack overflows and the memory pools highwater marks were OK.

Does anyone have any clue on what could it be? When I was using MQX 4.1 I think this problem didn't happen (but I can't be sure as it happens quite randomly).

Any help or comment would be appreciated.

Thank you!

Labels (1)
Tags (2)
0 Kudos
1 Solution
694 Views
lfschrickte
Contributor IV

Hi,

I think I figured out what my problem was! I'm not 100% sure yet, but I've 5 devices running without the issue happening again for 2 weeks.

The problem wasn't in the accept() call, but in the recv() one! It is caused by some connection problem that happens between the accept() and recv() call, which causes the recv() function to block forever! I've just defined the OPT_RECEIVE_TIMEOUT to a value larger than 0 for the socket and the problem didn't happen again!

I've also limited half open connections and max connections as you said, it is useful to my application, but didn't solve the issue. Thank you very much for your prompt response!

Regards!

View solution in original post

0 Kudos
2 Replies
694 Views
Martin_
NXP Employee
NXP Employee

Do you also use RTCSCFG_TCP_MAX_CONNECTIONS and RTCSCFG_TCP_MAX_HALF_OPEN to enable RTCS to monitor and discard half open connections. If not, I'd recommend to try.

695 Views
lfschrickte
Contributor IV

Hi,

I think I figured out what my problem was! I'm not 100% sure yet, but I've 5 devices running without the issue happening again for 2 weeks.

The problem wasn't in the accept() call, but in the recv() one! It is caused by some connection problem that happens between the accept() and recv() call, which causes the recv() function to block forever! I've just defined the OPT_RECEIVE_TIMEOUT to a value larger than 0 for the socket and the problem didn't happen again!

I've also limited half open connections and max connections as you said, it is useful to my application, but didn't solve the issue. Thank you very much for your prompt response!

Regards!

0 Kudos