lpcware

Low network throughput

Discussion created by lpcware Employee on Jun 15, 2016
Latest reply on Jun 15, 2016 by lpcware
Content originally posted in LPCWare by swiebertje on Fri Oct 24 03:41:14 MST 2014
Hi there,

I'm running the lpcopen 'iperf_server' example on a custom LPC1837 board in order to test the network throughput.
The throughput seems to be very low in both 100Mbit & 10Mbit modes.

100Mbit full-duplex:


iperf -i 5 -c 10.31.4.147 -m
------------------------------------------------------------
Client connecting to 10.31.4.147, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  3] local 10.31.4.146 port 1429 connected with 10.31.4.147 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  2.62 MBytes  4.40 Mbits/sec
[  3]  5.0-10.0 sec  2.50 MBytes  4.19 Mbits/sec
[  3]  0.0-10.3 sec  5.25 MBytes  4.29 Mbits/sec
[  3] MSS and MTU size unknown (TCP_MAXSEG not supported by OS?)


10Mbit full-duplex:


iperf -i 5 -c 10.31.4.147 -m
------------------------------------------------------------
Client connecting to 10.31.4.147, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  3] local 10.31.4.146 port 2105 connected with 10.31.4.147 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec   256 KBytes   419 Kbits/sec
[  3]  5.0-10.0 sec   128 KBytes   210 Kbits/sec
[  3]  0.0-16.9 sec   512 KBytes   249 Kbits/sec
[  3] MSS and MTU size unknown (TCP_MAXSEG not supported by OS?)


Ping tests:


Pinging 10.31.4.147 with 32 bytes of data:
Reply from 10.31.4.147: bytes=32 time<1ms TTL=255
Reply from 10.31.4.147: bytes=32 time<1ms TTL=255
Reply from 10.31.4.147: bytes=32 time<1ms TTL=255
Reply from 10.31.4.147: bytes=32 time<1ms TTL=255

Ping statistics for 10.31.4.147:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms



The tests are performed with a direct cable connection against a Windows 7 desktop with 1Gbit network card.
I've tested both with and without auto-negotiation (no noticeable performance changes).

The PHY I'm using is the LAN8720a configured with an external 50MHz oscillator which drives both the PHY and the MAC and uses the RMII interface.

The RMII pin muxing is basically the same as the lpcopen example;


STATIC const PINMUX_GRP_T pinmuxing[] = {
/* RMII pin group */
{0x1, 19, (SCU_MODE_HIGHSPEEDSLEW_EN | SCU_MODE_PULLUP | SCU_MODE_INBUFF_EN | SCU_MODE_ZIF_DIS | SCU_MODE_FUNC0)},
{0x0, 1,  (SCU_MODE_HIGHSPEEDSLEW_EN | SCU_MODE_PULLUP | SCU_MODE_ZIF_DIS | SCU_MODE_FUNC6)},
{0x1, 18, (SCU_MODE_HIGHSPEEDSLEW_EN | SCU_MODE_PULLUP | SCU_MODE_ZIF_DIS | SCU_MODE_FUNC3)},
{0x1, 20, (SCU_MODE_HIGHSPEEDSLEW_EN | SCU_MODE_PULLUP | SCU_MODE_ZIF_DIS | SCU_MODE_FUNC3)},
{0x1, 17, (SCU_MODE_HIGHSPEEDSLEW_EN | SCU_MODE_PULLUP | SCU_MODE_INBUFF_EN | SCU_MODE_ZIF_DIS | SCU_MODE_FUNC3)},
{0x7, 7,  (SCU_MODE_HIGHSPEEDSLEW_EN | SCU_MODE_PULLUP | SCU_MODE_ZIF_DIS | SCU_MODE_FUNC6)},
{0x1, 16, (SCU_MODE_HIGHSPEEDSLEW_EN | SCU_MODE_PULLUP | SCU_MODE_INBUFF_EN | SCU_MODE_ZIF_DIS | SCU_MODE_FUNC7)},
{0x1, 15, (SCU_MODE_HIGHSPEEDSLEW_EN | SCU_MODE_PULLUP | SCU_MODE_INBUFF_EN | SCU_MODE_ZIF_DIS | SCU_MODE_FUNC3)},
{0x0, 0,  (SCU_MODE_HIGHSPEEDSLEW_EN | SCU_MODE_PULLUP | SCU_MODE_INBUFF_EN | SCU_MODE_ZIF_DIS | SCU_MODE_FUNC2)},
};



The lwip configuration is the same as the lpcopen example except that I use lwip's internal memory allocator.


/*
* @brief LWIP build option override file
*
* @note
* Copyright(C) NXP Semiconductors, 2012
* All rights reserved.
*
* @par
* Software that is described herein is for illustrative purposes only
* which provides customers with programming information regarding the
* LPC products.  This software is supplied "AS IS" without any warranties of
* any kind, and NXP Semiconductors and its licensor disclaim any and
* all warranties, express or implied, including all implied warranties of
* merchantability, fitness for a particular purpose and non-infringement of
* intellectual property rights.  NXP Semiconductors assumes no responsibility
* or liability for the use of the software, conveys no license or rights under any
* patent, copyright, mask work right, or any other intellectual property rights in
* or to any products. NXP Semiconductors reserves the right to make changes
* in the software without notification. NXP Semiconductors also makes no
* representation or warranty that such application will be suitable for the
* specified use without further testing or modification.
*
* @par
* Permission to use, copy, modify, and distribute this software and its
* documentation is hereby granted, under NXP Semiconductors' and its
* licensor's relevant copyrights in the software, without fee, provided that it
* is used in conjunction with NXP Semiconductors microcontrollers.  This
* copyright, permission, and disclaimer notice must appear in all copies of
* this code.
*/

#ifndef __LWIPOPTS_H_
#define __LWIPOPTS_H_

/* Standalone build */
#define NO_SYS                          1

/* Use LWIP timers */
#define NO_SYS_NO_TIMERS                0

/* Need for memory protection */
#define SYS_LIGHTWEIGHT_PROT            0

/* 32-bit alignment */
#define MEM_ALIGNMENT                   4

/* pbuf buffers in pool. In zero-copy mode, these buffers are
   located in peripheral RAM. In copied mode, they are located in
   internal IRAM */
#define PBUF_POOL_SIZE                  7

/* No padding needed */
#define ETH_PAD_SIZE                    0

#define IP_SOF_BROADCAST                1
#define IP_SOF_BROADCAST_RECV           1

/* The ethernet FCS is performed in hardware. The IP, TCP, and UDP
   CRCs still need to be done in hardware. */
#define CHECKSUM_GEN_IP                 1
#define CHECKSUM_GEN_UDP                1
#define CHECKSUM_GEN_TCP                1
#define CHECKSUM_CHECK_IP               1
#define CHECKSUM_CHECK_UDP              1
#define CHECKSUM_CHECK_TCP              1
#define LWIP_CHECKSUM_ON_COPY           1

/* Use LWIP version of htonx() to allow generic functionality across
   all platforms. If you are using the Cortex Mx devices, you might
   be able to use the Cortex __rev instruction instead. */
#define LWIP_PLATFORM_BYTESWAP          0

/* Non-static memory, used with DMA pool */
#define MEM_SIZE                        (12 * 1024)

/* Raw interface not needed */
#define LWIP_RAW                        1

/* DHCP is ok, UDP is required with DHCP */
#define LWIP_DHCP                       0
#define LWIP_UDP                        1

/* Hostname can be used */
#define LWIP_NETIF_HOSTNAME             1

#define LWIP_BROADCAST_PING             1

/* MSS should match the hardware packet size */
#define TCP_MSS                         1460
#define TCP_SND_BUF                     (2 * TCP_MSS)

#define LWIP_SOCKET                     0
#define LWIP_NETCONN                    0
#define MEMP_NUM_SYS_TIMEOUT            300

#define LWIP_STATS                      0
#define LINK_STATS                      0
#define LWIP_STATS_DISPLAY              0

/* There are more *_DEBUG options that can be selected.
   See opts.h. Make sure that LWIP_DEBUG is defined when
   building the code to use debug. */
#define TCP_DEBUG                       LWIP_DBG_OFF
#define ETHARP_DEBUG                    LWIP_DBG_OFF
#define PBUF_DEBUG                      LWIP_DBG_OFF
#define IP_DEBUG                        LWIP_DBG_OFF
#define TCPIP_DEBUG                     LWIP_DBG_OFF
#define DHCP_DEBUG                      LWIP_DBG_OFF
#define UDP_DEBUG                       LWIP_DBG_OFF

/* This define is custom for the LPC EMAC driver. Enabled it to
   get debug messages for the driver. */
#define UDP_LPC_EMAC                    LWIP_DBG_OFF

/* Required for malloc/free */
#include <stdlib.h>

#endif /* __LWIPOPTS_H_ */



lpcopen version: lpcopen-lpc18xx-2.12
toolchain version: gcc version 4.7.4 20140401 (release) [ARM/embedded-4_7-branch revision 209195] (GNU Tools for ARM Embedded Processors)
iperf version: 2.0.5 (08 Jul 2010) pthreads

I've also attached the wireshark traces.

Any ideas which direction I could search to improve the throughput?

Regards.

Original Attachment has been moved to: iperf_100mbit.pcapng.gz

Original Attachment has been moved to: iperf_10mbit.pcapng.gz

Outcomes