LwIP sample program failure

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

LwIP sample program failure

4,567 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Mon Jun 11 10:59:57 MST 2012
Hello,

I have downloaded the port of lwIP for LPC17xx and encountered a program failure after some time the controller (LPC1788) handes bus events
(I ran the TCP echo sample, but I suspect that any will do to demonstrate this):

    /* all pbufs in a chain are referenced at least once */
    LWIP_ASSERT("pbuf_free: p->ref > 0", p->ref > 0);

Which can be found at pbuf.c, Line 627 of version 1.4.0 (the function "pbuf_free"). The failure occurrs both with dynamic and staticaly allocated heap after at most several minutes of running, being pinged or not.
How can this happen? Did you encounter this too? What can be done about it?
Labels (1)
0 Kudos
Reply
27 Replies

3,502 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Mon Jun 25 02:11:18 MST 2012
It is very kind of you to share your changes. Thank you.
Either way, I would expect the linker to automatically map variables that do not fit in IRAM to DRAM. I will make another attempt first with your code, then I will make an attempt based on my sources.
Regarding memory usage: I have been working successfully with uIP so far (but covet the socket interface of lwIP + improved performance...) so I'm used to a tiny footprint. It is hard to argue with facts - your program keeps on functioning - but it would be interesting to know how much memory is actually consumed? What is the minimum RAM footprint?
0 Kudos
Reply

3,502 Views
lpcware
NXP Employee
NXP Employee

Content originally posted in LPCWare by SeleneSW on Mon Jun 25 01:33:58 MST 2012
Hi,

good news, the demo program is still running after the week-end!

tamirmichael,

I'm using the tcpecho_freertos demo integrated in my own project and running on a proprietary board.
I started from the new sources posted by Kevin (http://sw.lpcware.com/?p=lwip_lpc.git&a=summary)
and modified them as indicated.

lwipopts.h:
#define PBUF_POOL_SIZE            256
#define MEM_SIZE (4096*1024)

lpc_emac_config.h:
#define LPC_NUM_BUFF_RXDESCS 128
#define LPC_NUM_BUFF_TXDESCS 128

Then I created a new DRAM section called LWIP_RAM modifying my linker script file.
Last, I modified some variable declarations in LwIP to force their allocation in the
new section (see attachments).

After that the program started to function correctly, but now I am a little bit confused:
if over 4MB of memory are needed to perform a simple echo function, can we call it
"Light Weight IP" ?

0 Kudos
Reply

3,502 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by wellsk on Fri Jun 22 15:28:08 MST 2012
Hmm, I'm really not sure what's happening. With the FLASH build, you should be able to turn on LWIP debugging and get some output messages from the UART on failure. Maybe you'll catch something I can trace? (Set LWIP_DEBUG as a define in your compile argument path). You won't need to have a debugger attached to see the error output on the UART if this is defined.

You can trace pbuf related functions by setting the following define in lwipopts.h when LWIP_DEBUG is enabled.
<code>
#define PBUF_DEBUG                      LWIP_DBG_ON
</code>

The MAC driver also has a special define you can enable to trace some of it's events. Continuous failures to allocate a pbuf and queue a receive descriptor would indicate memory is very tight.
<code>
#define UDP_LPC_EMAC                    LWIP_DBG_ON
</code>

http://www.lpcware.com/content/project/lightweight-ip-lwip-networking-stack/configuring-lwip/enablin...
Note: Enabling debug will drastically impact performance.

Knowing what type of error (or lack of error) is important in determining the failure. In the zero-copy implementation, a failure to allocate a pbuf for the receive queue may not necessarily always be a failure if other receive descriptors still have pbufs associated with them. The driver will attempt to allocate them later, but you will see a warning message like this:
<code>
lpc_rx_queue: could not allocate RX pbuf index 4...
</code>

A failure to allocate at least 1 pbuf when all receive descriptors are used would be bad. This results in a situation where no more descriptors are queued and the MAC will reject receive packets from that point on (until more descriptors/pbufs are queued). Receive pbufs are allocated in the receive task whenever a packet is received. The new pbuf is immediately associated with the new descriptor before the current packet is processed. If it can't allocate it, it goes on and will attempt to allocate on the next received packet (possibly allocating and queuing more than 1 pbuf). Receive packets are always allocated with the PBUF_DMA type with size 1536 bytes and are contiguous in memory.

If it's any easier, I can provide a copy based pbuf implementation of the driver that frees no pbufs and only allocates pbufs for the receive buffer copy. This is a full static implementation of the descriptors and buffers, but is slower than the zero-copy approach. It won't help with memory constrained issues though.
0 Kudos
Reply

3,502 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Fri Jun 22 14:22:30 MST 2012
Mario,

Can you tell me what software you are using? I have simply added memory to a DRAM variant make available as indicated above - which eventually failed. Did you run the .hex file provided here?
0 Kudos
Reply

3,502 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by SeleneSW on Fri Jun 22 08:04:14 MST 2012
I have stopped the test. It was alive and kicking after about 370.000 echos...
I think the problem is solved, but I'm going to make more tests leaving it working the whole week-end.
See you on monday.
0 Kudos
Reply

3,502 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Fri Jun 22 03:47:43 MST 2012
Mine failed after about 2 hours...
0 Kudos
Reply

3,502 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by SeleneSW on Fri Jun 22 03:14:56 MST 2012
I have managed my sources in order to allocate pbuf areas on external SDRAM and the new FW is running ok.
I will keep it running for at least 3 hours and then I will report.

Thank you for the help.

Mario
0 Kudos
Reply

3,502 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Fri Jun 22 01:22:59 MST 2012
Kevin,

I have adjusted the latest state of affairs to accommodate more memory - somehow my board refused to boot with the .hex file you provided (so there is, unfortunately, no extra debug information available in the image that I test now). Either way, it seems a little odd that a system would run out of memory sitting idle (but of course, the network interface does process packets). I would expect that there is still a memory/resource "leak" somewhere...
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Fri Jun 22 01:02:15 MST 2012
Thanks. I have downloaded "dram_uart_example.hex". I will start a test right away.
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by wellsk on Thu Jun 21 11:19:32 MST 2012
>Test failed again. I did not have a debugger connected
I wasn't able to repeat the failure here with the current FLASH variant of the ea1788_tcpecho_freertos project.
I opened multiple (3+) flood ping sessions to the EA1788 board and ran them for an hour - the board never lost ping response, although it did occasionally lose some packets or get increased latency times due to loading. (This is with only 3 TX and RX descriptors being used for the driver, so it's to be expected).

The FLASH example is very memory limited with the following configuration:
From lwipopts.h...
<code>
#define PBUF_POOL_SIZE                  6
#define MEM_SIZE(12*1024)
</code>
From lpc_emac_config.h...
<code>
#define LPC_NUM_BUFF_RXDESCS 3
#define LPC_NUM_BUFF_TXDESCS 3
</code>

It -is- very possible to get LWIP into a state where it's internally 'out of memory' and no more pbufs can be allocated for the receive descriptor queue. If this happens, the code will continue to run, but the MAC controller will accept no more packets as no more descriptors are queued due to no buffers being allocatable! Packet fragmentation will definitely do this (ping -f -s 5000 <IP address>). When LWIP allocates space for copied packet fragments, it uses the same memory for normal buffers - and the pbuf allocation will eventually fail. Only adding more memory will fix this.

The FLASH variant example uses the internal RAM only. Ethernet buffers and descriptors can only be located in peripheral RAM on the 17xx, which is only 32K. This isn't a lot for LWIP. A DRAM variant is also available that uses DRAM for buffer/pbuf/descriptor allocation from a huge memory pool.
Here are the DRAM variant settings, this also boots from FLASH...
<code>
#define PBUF_POOL_SIZE                  256
#define MEM_SIZE(4096*1024)
#define LPC_NUM_BUFF_RXDESCS 128
#define LPC_NUM_BUFF_TXDESCS 128
</code>

I've attached both versions (FLASH/IRAM and FLASH/DRAM) of the example here for the EA1788 board. I've also attached a debug version of the DRAM version here that should output messages to the UART port similar to the following. If LWIP fails, you should see an error message you can re-post here.
<code>
lpc_low_level_input: Packet received: a0027110, size 102 (index=109)
pbuf_alloc(length=1536)
pbuf_alloc(length=1536) == a0026af4
lpc_rxqueue_pbuf: pbuf packet queued: a0026af4 (free desc=0)
pbuf_header: old a0027120 new a002712e (-14)
pbuf_free(a0027110)
pbuf_free: deallocating a0027110
lpc_low_level_input: Packet received: a002772c, size 102 (index=110)
pbuf_alloc(length=1536)
pbuf_alloc(length=1536) == a0027110
lpc_rxqueue_pbuf: pbuf packet queued: a0027110 (free desc=0)
pbuf_header: old a002773c new a002774a (-14)
pbuf_free(a002772c)
pbuf_free: deallocating a002772c
lpc_low_level_input: Packet received: a0027d48, size 102 (index=111)
</code>
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Thu Jun 21 06:10:44 MST 2012
Test failed again. I did not have a debugger connected to the target but it is easy to reproduce - just let any of the test programs run pings (or not) just like above.
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Thu Jun 21 05:29:32 MST 2012
I am conducting an additional test to make sure what I have seen is indeed a legitimate failure.
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Thu Jun 21 04:58:13 MST 2012
Test failed (ping reply no longer generated). I did not have a debugger connected to the target.
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Thu Jun 21 03:58:50 MST 2012
Thanks for the effort. I have downloaded the corrected program and just started what I hope will be a very long test. I will report back.
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by wellsk on Wed Jun 20 14:35:48 MST 2012
I have uploaded the LPC17xx and LCP43xx ports with larsjep's updates and several other fixes for both the 17xx and 43xx platforms.
You can pull the changes from the GIT repo at: http://sw.lpcware.com/?p=lwip_lpc.git&a=summary
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by SeleneSW on Tue Jun 19 02:54:43 MST 2012
I have the same problem with the "ea1788_http_freertos" demo.
I tried with the patches described above but with no success.

Any news about this thread?
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Thu Jun 14 00:46:05 MST 2012
I will try this fix ASAP but I must deal with some bush fires first...
Thanks for the efforts, I will report the results once I have something.
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by wellsk on Wed Jun 13 09:11:51 MST 2012
I've entered a bug tracker issue for this case.
http://www.lpcware.com/content/bugtrackerissue/lwip-17xx-freertos-examples-use-wrong-input-function
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by wellsk on Wed Jun 13 08:17:43 MST 2012
It looks like you are using FreeRTOS and the failure occurred on the input path. I'd keep the TX pbuf patch in, but it looks like I might have another problem with the example code itself using the wrong input path for an RTOS. (Sorry :()

When used with FreeRTOS, the 'ethernet_input' function should be replaced with the 'tcpip_input' function. This is correct for the LPC18xx/43xx examples, but it seems I got a little to comfortable with cut-and-paste.

In the example code, can you make the change indicated below and give it a try?

<code>
  /* Add netif interface for lpc17xx_8x */
  memset(lpc_netif, 0, sizeof(lpc_netif));
//  if (!netif_add(&lpc_netif, &ipaddr, &netmask, &gw, NULL, lpc_enetif_init, ethernet_input))
  if (!netif_add(&lpc_netif, &ipaddr, &netmask, &gw, NULL, lpc_enetif_init, tcpip_input))
    LWIP_ASSERT("Net interface failed to initialize\r\n", 0);
</code>
0 Kudos
Reply

3,504 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by tamirmichael on Wed Jun 13 01:06:40 MST 2012
I managed to fail the program with the patch - here is the stack trace:

pbuf_free (called by lpc17_emac.c, line 477)
lpc_enetif_input (called by lpc17_emac.c, line 787)
vPacketReceiveTask

It took me something like 45 minutes of continuous pinging (I stopped and resumed shortly before the failure) to cause this.
I hope this helps.
0 Kudos
Reply