I think the kids today call this a "self-own". I did some digging, and now I'll lay out the case for why the SDK should have its own implementation of LWIP_ASSERT_CORE_LOCKED().
My issue is that I was not calling LOCK_TCPIP_CORE() before calling the lwIP MQTT API functions. I didn't immediately realize that core locking was required for this API; for instance, the lwip sockets API doesn't require explicit locking, as it's all handled inside the API. Because my target was lightly loaded on the network, the target would run fine for some period of time, exchanging messages with the MQTT server, but would eventually get itself out of sync. (Edit: after 24 hours of testing, this was definitely the issue.)
What's useful is that lwIP has a protection against just this very scenario: the macro LWIP_ASSERT_CORE_LOCKED(). This macro is called in just about every location where the lwIP code needs exclusive access to the TCP/IP core functions, including the MQTT API. Example:
void
mqtt_disconnect(mqtt_client_t *client)
{
LWIP_ASSERT_CORE_LOCKED();
LWIP_ASSERT("mqtt_disconnect: client != NULL", client);
if (client->conn_state != TCP_DISCONNECTED) {
client->conn_state = TCP_DISCONNECTED;
mqtt_close(client, (mqtt_connection_status_t)0);
}
}
The macro is documented and defined by default in lwip/src/include/lwip/opt.h:
#if !defined LWIP_ASSERT_CORE_LOCKED || defined __DOXYGEN__
#define LWIP_ASSERT_CORE_LOCKED()
#endif
As you can see, if it's not otherwise defined, the macro does nothing. If it had been defined, I'm sure I would have noticed my issue a lot sooner. Thankfully, it's not very difficult to implement it. First, we need a function that will assert if the core is not locked. I put this code in lwip/port/sys_arch.c near the top:
#if LWIP_TCPIP_CORE_LOCKING
extern sys_mutex_t lock_tcpip_core;
void lwip_assert_core_locked(void)
{
if (NULL == lock_tcpip_core)
return;
if (xPortIsInsideInterrupt())
return;
LWIP_ASSERT("TCPIP core is locked", (0 == uxSemaphoreGetCount(lock_tcpip_core)));
}
#endif
Then I added this block of definitions to lwipopts.h:
#define LWIP_MPU_COMPATIBLE 0
#define LWIP_TCPIP_CORE_LOCKING 1
#define LWIP_TCPIP_CORE_LOCKING_INPUT 0
#if LWIP_TCPIP_CORE_LOCKING
#ifdef __cplusplus
extern "C" {
#endif
void lwip_assert_core_locked(void);
#define LWIP_ASSERT_CORE_LOCKED() do { lwip_assert_core_locked(); } while (0)
#ifdef __cplusplus
}
#endif
#endif
Then I added the appropriate core lock/unlock calls to my code. Example:
err_t MqttClient::Subscribe(const char * topic, MqttQos qos) {
LOCK_TCPIP_CORE();
err_t err = mqtt_sub_unsub(client_, topic, ToIntegral(qos), CallbackRequest, this, 1);
UNLOCK_TCPIP_CORE();
[...]
}
And that's pretty much it. If you're not sure whether you need to lock the core manually before an API call, just take a peek inside the body of the function. If you see LWIP_ASSERT_CORE_LOCKED(), then you are required to lock the TCP/IP core before calling it, otherwise you'll probably encounter issues like I did.
I'd like to suggest to the NXP SDK team that the SDK, by default, provide an implementation of LWIP_ASSERT_CORE_LOCKED() for example and SDK projects. My implementation is written strictly for (NO_SYS == 0) and a FreeRTOS environment, so writing a more general implementation will be a bit more complex. But it would probably help keep a number of projects on the rails.
Hopefully this is helpful to some of you out there developing your own lwIP projects.
David R.