lwip with the LPC4350

lpcware · ‎06-15-2016

Content originally posted in LPCWare by gregd on Wed Oct 17 11:42:05 MST 2012
I have been having some lock up problems when stress testing the LPC lwip port for the LPC4350. I am using in stand-alone mode without FreeRTOS. When testing 16 simultaneous TCP connections, I am having a problem with the following code in lpc18xx_43xx_emac.c line 689:

                /* Wait until enough descriptors are available for the transfer. */
                /* THIS WILL BLOCK UNTIL THERE ARE ENOUGH DESCRIPTORS AVAILABLE */
                while (dn > lpc_tx_ready(netif))
#if NO_SYS == 0
                                xSemaphoreTake(lpc_netifdata->xTXDCountSem, 0);
#else
                                msDelay(1);
#endif

I guess when congestion occurs the tx_free_descs (Number of free TX descriptors) goes to zero. The code is becoming stuck in the above loop and never exits. Since I am running with NO_SYS == 1, there is no mechanism for the tx descriptors to be freed up is there? Will the DMA transfers automatically do this?

Can you suggest a solution to this problem.

Are there any plans to further develop/test the lwip library to make it more robust?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Johnny D on Tue Jun 10 03:24:40 MST 2014
Hi Kevin,

Did NXP ever come up with a reliable solution to this problem?

Many thanks,

John.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wellsk on Mon Aug 19 16:12:00 MST 2013
This lockup problem seems to be causing snags with a number of users - sorry about that :(
I'll look at your change (and the other suggested changes) soon and provide an update for this - and provide an alternate driver with the more traditional copied pbuf style buffers similar to the release in FreeRTOS.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by gregd on Thu Aug 15 06:02:00 MST 2013
I had this same problem and posted the original question about the following code from lpc18xx_43xx_emac.c looking up when NO_SYS = 1 and (dn > lpc_tx_ready(netif)):

/* Zero-copy TX buffers may be fragmented across mutliple payload
   chains. Determine the number of descriptors needed for the
   transfer. The pbuf chaining can be a mess! */
dn = (u32_t) pbuf_clen(p);

/* Wait until enough descriptors are available for the transfer. */
/* THIS WILL BLOCK UNTIL THERE ARE ENOUGH DESCRIPTORS AVAILABLE */
while (dn > lpc_tx_ready(netif))
#if NO_SYS == 0
{xSemaphoreTake(lpc_netifdata->xTXDCountSem, 0); }
#else
{msDelay(1); }
#endif

I have been using the old version of lwip that was created to work with the lpc43xx PDL. I had modified the code shortly after the original post to simply return an error if this condition occurs:

#if NO_SYS == 0
while (dn > lpc_tx_ready(netif))
    {xSemaphoreTake(lpc_netifdata->xTXDCountSem, 0); }
#else
    if (dn > lpc_tx_ready(netif))
        return ERR_MEM;
#endif

This version of the code seems to work very well. I have not experienced any lockups and have not had any communications issues to really speak of either. We have stress tested the code very thoroughly and have very few hiccups in communication, especially any that we would point directly to lwip. We have two different application protocols that are running randomly in 16 simultaneous TCP server sessions and see very few issues of any kind. If this is causing a problem, it at least seems to recover with no long term ill effects. I have only tested with NO_SYS = 1 so I can't comment as to the reliability of the version with RTOS.

I have recently switched to using the new LPCOpen v1.03 and noticed that the code is still implemented with the original problem. Can someone please review my change and see if this would be a good candidate to change in the new v2.0 version of LPCOpen. I am only using DHCP and TCP raw mode in my application so there may be other ramifications that I am not thinking of but this seems to resolve a very severe problem that will completely lock up the code and require power cycling.

Thanks,
Greg Dunn

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Johnny D on Mon Jul 15 08:15:17 MST 2013
Hi Kevin, thanks for the advice.

I set everything back to how it was before, except with the descriptors also in AHB SRAM and the DMA underflows stopped happening. I have a feeling we are maxing out the bandwidth of our SDRAM so much on the M4 core that it does have a negative effect on the Ethernet. There shouldn't be any PBUF_ROM type pbufs in this system and certainly none located in SPI Flash because our M0 core is completely unaware of the SPI Flash (it doesn't exist in the M0 scatter file).

It all seems relatively stable now so for now I am happy, although it would be nice to know how to properly handle a DMA underflow should it ever happen.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wellsk on Fri Jul 12 08:31:00 MST 2013

Hi Johnny,

You should be ok with all the descriptors in SDRAM or IRAM. The descriptors themselves won't cause an underflow condition - the underflow conditions occurs when the EMAC DMA engine needs TX data to send (and the transfer is larger thena the EMAC FIFO) and can't get it fast enough to keep the packet on the wire uninterrupted.

This issue is related to the buffer locaiton used for the ethernet - it's memory transfer rate is slower than the ethernet requires. Look closely at the address of the ethernet buffers (pbuf->payload) and make sure they are located in IRAM or SDRAM. It's very possible you might be trying to send a PBUF_ROM type pbuf with it's payload located in read-only (slow) SPI FLASH memory. In this case, the Ethernet will underflow.

In you can't easily use the LWIP_DEBUG capability, you might try stick a few printfs in the driver in the TX function to dump the payload addresses for the packets to send.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Johnny D on Fri Jul 12 02:10:14 MST 2013

Thanks for the assistance. I can't post any of the project here, company regulations would prevent that.

When I posted this problem, I was only using the first 32Kb of AHB SRAM (address 0x20000000 to 0x200800) for the lwip heap. In lwipopts.h I had MEM_SIZE defined as (31 * 1024). In lpc_emac_config.h I had LPC_NUM_BUFF_TXDESCS and LPC_NUM_BUFF_RXDESCS defined as 10 (so 20 in total). Hopefully, the heap being in AHB SRAM means the DMA should be able to access it quickly enough.

If you are saying that 7 tx descriptors should be enough, that leads me to believe there is a problem elsewhere in the code.

What I have done since is, stole some more of the AHB SRAM from the other core and I now have 61Kb to play with. I have set MEM_SIZE to now be (58 * 1024) which leaves me with 2028bytes free. I realised that because of the way our project has to work, the M0 had put the lpc_enetdata struct into external SDRAM. I have now re-located this struct into the spare 2028bytes in AHB SRAM. I believe this may have been the cause of the DMA underflow because this struct contains the actual descriptors. It wasn't enough just to have the pbuf memory in SRAM.

I will now put everything back to how it was before, except with the lpc_enetdata struct in AHB SRAM and see what happens.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wellsk on Thu Jul 11 12:52:00 MST 2013

Descriptors should be re-queued for transmit on underflow - this sounds like an oversight and should get a <a href="http://www.lpcware.com/node/add/ct-plus-task">bug tracker issue </a>created for it. However, you'll need to determine the reason for the underflow as re-queueing won't help if that issue exists.

>dn = 2 (number of descriptors needed)

Are you using the HTTP server example? I've seen cases where the pbuf chain contains a buffer that is a ROM buffer in SPIFLASH or slower memory. The Ethernet DMA will attempt to queue a buffer from any memory locaiton in a descriptor, but if the memory is too slow, the ethernet will underflow.

A mechanism was added to copy buffers located in slower memory to a faster pbuf prior to sending using the LPC_CHECK_SLOWMEM and LPC_SLOWMEM_ARRAY definitions. If you set LPC_CHECK_SLOWMEM to 1 and setup an array on memory addresses/ranges with slower memory using LPC_SLOWMEM_ARRA, the driver will automatically handle pbuf copying for buffers if it's needed.

Here's an example setup for external SPIFI FLASH..

<pre>#define LPC_CHECK_SLOWMEM 1</pre>
<pre>#define SPIFI_BASE_ADDR 0x14000000
#define XFLASH_BASE_ADDR 0x1C000000</pre>
<pre>#define LPC_SLOW_MEM_SIZE ((4 * 1024 * 1024) - 1)

/* Array of slow memory address ranges for LPC_CHECK_SLOWMEM */
#define LPC_SLOWMEM_ARRAY {{SPIFI_BASE_ADDR, (SPIFI_BASE_ADDR + ((4 * 1024 * 1024) - 1))}}</pre>

>I am reluctant to increase LPC_NUM_BUFF_TXDESCS because (a) i don't have any more SRAM available and (b) it would only make this bug happen less often. At this point of development I want to make it happen more often so I can fix it properly.

It looks like you have 7 now, that's more than enough. You have 7 'in use', but likely not reclaimed correctly due to underflow.

>I tried enabling EMAC_DEBUG but the amount of trace that comes out is such that it slows the whole process down enough to criple the processor, so it's not representative of the normal operating conditions.

True, but the output is useful for debugging this specific issue. It gives an idea of where the buffers are being sent from. I wouldn't keep this one once LWIP is operating correctly. I can help debug this to an extent, but would need to see the messages the LWIP EMAC driver produces at run-time. Maybe you can post your project files?

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Johnny D on Thu Jul 11 05:47:00 MST 2013

It looks like the ethernet has underflowed.

DMA_STAT register = 0x006804E5 (Transmit interrupt | Transmit buffer unavailable | Transmit underflow | Early transmit interrupt).

Looking at the current TX descriptor, it still has bit 31 set (indicating it is owned by the DMA?).

tx_reclaim_idx = 6,

tx_fill_idx = 6,

tx_free_descs = 0

dn = 2 (number of descriptors needed)

txdesc[6].CTRLSTAT = 0xF0D00000

interestingly, txdesc[4] and txdesc[5] have bit 31 clear suggesting they might be free to use again?

The datasheet indicates that if an underflow occurs, the driver must explicitly issue a Transmit Poll Demand after rectifying the suspension cause. I have tried the following code in place of the while loop but it still doesn't work, as bit 31 never gets cleared by the DMA:

<pre> if ( dn > lpc_tx_ready(netif) ) {</pre>
<pre>    // check for tx underflow
    if ( LPC_ETHERNET->DMA_STAT & DMA_ST_UNF ) {
      LPC_ETHERNET->DMA_STAT &= ~(DMA_ST_UNF);
      LPC_ETHERNET->DMA_TRANS_POLL_DEMAND = 1;

      while ( dn > lpc_tx_ready(netif) ) {
        lpc_tx_reclaim(netif);
      }
    }
}</pre>

I am reluctant to increase LPC_NUM_BUFF_TXDESCS because (a) i don't have any more SRAM available and (b) it would only make this bug happen less often. At this point of development I want to make it happen more often so I can fix it properly.

I tried enabling EMAC_DEBUG but the amount of trace that comes out is such that it slows the whole process down enough to criple the processor, so it's not representative of the normal operating conditions.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by wellsk on Wed Jul 10 08:35:15 MST 2013
    /* Zero-copy TX buffers may be fragmented across mutliple payload
       chains. Determine the number of descriptors needed for the
       transfer. The pbuf chaining can be a mess! */
   <strong> dn = (u32_t) pbuf_clen(p);</strong>

    /* Wait until enough descriptors are available for the transfer. */
    /* THIS WILL BLOCK UNTIL THERE ARE ENOUGH DESCRIPTORS AVAILABLE */
    <strong>while (dn > lpc_tx_ready(netif))</strong>
#if NO_SYS == 0
    {xSemaphoreTake(lpc_netifdata->xTXDCountSem, 0); }
#else
    {msDelay(1); }
#endif</pre>

For the code snippet above, the msDelay is probably a bad design choice on our part. Without reclaiming TX descriptors somewhere else, this can loop forever if there are not enough descriptors on the initial call. Calling lpc_tx_reclaim() is a good workaround for this.

There are a few things that could be happening here.

1) The pbuf chain might be a non-contiguous chained pbuf that needs more descriptors than are actually available. Is the value of 'dn' greater than the number defined by LPC_NUM_BUFF_TXDESCS? Try increasing the LPC_NUM_BUFF_TXDESCS value - this will also have a slight increase in required memory, but will allow a larger float of free TX descriptors.

2) Is the ethernet actually transmitting the packets or just holding them? Holding them might indicate another problem. You can try enabling LWIP_DEBUG and EMAC_DEBUG to get EMAC driver status messages on the UART to get an idea of what might be happening. Posting that here would be very helpful.

lpcware · ‎06-15-2016

Content originally posted in LPCWare by Johnny D on Wed Jul 10 01:29:43 MST 2013

I have also run into the same problem. There is absolutely no way it will ever leave that while loop unless you are using NO_SYS == 0.

I don't suppose you ever found a solution to this? I have tried calling lpc_tx_reclaim() within that while loop but that didn't work.

lwip with the LPC4350

lwip with the LPC4350

LPC43xx