AnsweredAssumed Answered

MQX TCP bug - Lockup on more than one simultaneous open.

Question asked by Chris Solomon on Dec 22, 2013
Latest reply on Jun 24, 2019 by Daniel Chen


I have been working on this problem for about a week, and thought I would share my progress.

(Using MQX 4.0.1, but have checked and MQX does not seem to have a fix.)



I am working on a product with a 3G modem and a K70 based processor. It connects to the internet using PPP over USBOTG, and runs FTP and telnet servers, uses DNS, and sends data to a remote server (over TCP-IP).

We've noticed that if we cycle connection to the cellular network (simulating a customer moving in and out of coverage) we get reliable repeatable lockups (watchdog resets) .



I have tracked through RTCS and I have identified what I believe to be the problem:

To open a socket for an outgoing connection a call to socket_connect is made, which in turn calls the connect macro, which calls SOCK_STREAM_connect.

SOCK_STREAM_connect Has a local variable 'parms', which is populated and passed by reference (using RTCSCMD_issue) to TCP_Process_open.

TCP_Process_open stores this pointer in a linked list of pending open requests.

Note - this is a pointer to a local variable, which will go out of scope.


Each time a socket is opened, the local variable has the same address, so when the next pointer is stored it creates a loop in the linked list of pending open requests, which leads to an infinite loop in TCP_Return_open.


The same problem occurs when opening a socket to listen:

listen -> SOCK_STREAM_listen -> TCP_Process_open


And TCP_Process_accept also looks like it suffers from the same vulnerability.


Proposed Solution:

Allocate memory in TCP_Process_open and TCP_Process_accept, and copy the data from the passed pointer to the newly allocated variable, then store that in the linked list.

Memory can be freed in the TCP_Return_open function after the call to RTCSCMD_complete.

I've attached a file with my fix - I've done a few hours bench testing, and it seems to solve the problem.



Are there any problems with the propose solution?

Alternative solutions?

Original Attachment has been moved to: