LX2160ARDB ipsec disconnection issue

cancel
Showing results for 
Search instead for 
Did you mean: 

LX2160ARDB ipsec disconnection issue

398 Views
jackho
Contributor II

Hi,

We are do ipsec testing in lx2160ardb and not using dpdk, test conditions as below:

Strongswan (5.8.4)

Linux Kernel(4.19)

When Ipsec tunnel created, we use iperf3 to test tunnel downlink performance 

the data throughout is about 1.5 ~ 2Gbits/sec

After testing about 30 mins or 1hour, the time is random not fixed , it seems ipsec tunnel start drop packages and cannot send packages and receive packages anymore, we use ipsec status check the tunnel, the tunnel still there, but even using ping command, the ping packages didn't send out. 

By the way, we also test it on EVB board(kernel version:5.4), it still has the same issue.

Any suggestion about this issue? 

Thank you.

Jack

0 Kudos
4 Replies

335 Views
jackho
Contributor II

HI @andrei_skok,

Sorry for typo issue: LSDK we use 2004.

BR,

Jack 

0 Kudos

329 Views
jackho
Contributor II

HI @andrei_skok 

it seems I have no permission reply in case, so I reply here.

  • The version of MC as below:

U-Boot 2019.10 (Oct 12 2021 - 11:09:54 +0800)

SoC:  unknown (0x87361020)

Clock Configuration:

       CPU0(A72):2200 MHz  CPU1(A72):2200 MHz  CPU2(A72):2200 MHz  

       CPU3(A72):2200 MHz  CPU4(A72):2200 MHz  CPU5(A72):2200 MHz  

       CPU6(A72):2200 MHz  CPU7(A72):2200 MHz  CPU8(A72):2200 MHz  

       CPU9(A72):2200 MHz  CPU10(A72):2200 MHz  CPU11(A72):2200 MHz  

       CPU12(A72):2200 MHz  CPU13(A72):2200 MHz  CPU14(A72):2200 MHz  

       CPU15(A72):2200 MHz  

       Bus:      750  MHz  DDR:      3200 MT/s

Reset Configuration Word (RCW):

       00000000: 5883833c 24580058 00000000 00000000

       00000010: 00000000 0c010000 00000000 00000000

       00000020: 016001a0 00002580 00000000 1800164e

       00000030: 08000000 00000000 00000000 00000000

       00000040: 00000000 00000000 00000000 00000000

       00000050: 00000000 00000000 00000000 00000000

       00000060: 00000000 00000000 00027088 00000000

       00000070: 00b30030 00150022

Model: NXP Layerscape LX2160ARDB Board

Board: unknown-RDB, Board version: @, boot from eMMC

FPGA: v0.0

SERDES1 Reference: Clock1 = 161.13MHz Clock2 = 161.13MHz

SERDES2 Reference: Clock1 = 100MHz Clock2 = 100MHz

SERDES3 Reference: Clock1 = 100MHz Clock2 = 100MHz

VID: failed to select VDD Page 0

VID: Couldn't read sensor abort VID adjustment

core voltage not adjusted

DRAM:  15.9 GiB

DDR    15.9 GiB (DDR4, 64-bit, CL=22, ECC off)

Using SERDES1 Protocol: 19 (0x13)

Using SERDES2 Protocol: 5 (0x5)

Using SERDES3 Protocol: 0 (0x0)

SERDES3[PRTCL] = 0x0 is not valid

MMC:   FSL_SDHC: 0, FSL_SDHC: 1

Loading Environment from MMC... OK

EEPROM: Read failed.

In:    serial_pl01x

Out:   serial_pl01x

Err:   serial_pl01x

Net:   PCA: failed to select proper channel

Could not get PHY for FSL_MDIO0: addr 4

Failed to connect

Could not get PHY for FSL_MDIO0: addr 1

Failed to connect

Could not get PHY for FSL_MDIO0: addr 2

Failed to connect

PCIe0: pcie@3400000 disabled

PCIe1: pcie@3500000 disabled

PCIe2: pcie@3600000 Root Complex: no link

PCIe3: pcie@3700000 disabled

PCIe4: pcie@3800000 disabled

PCIe5: pcie@3900000 disabled

DPMAC3@xgmii

Warning: DPMAC3@xgmii (eth0) using random MAC address - ce:b2:70:e8:0e:71

, DPMAC4@xgmii

Warning: DPMAC4@xgmii (eth1) using random MAC address - b2:34:6b:97:5e:dc

, DPMAC5@25g-aui

Warning: DPMAC5@25g-aui (eth2) using random MAC address - de:c6:89:80:14:f6

, DPMAC6@25g-aui

Warning: DPMAC6@25g-aui (eth3) using random MAC address - be:33:ec:55:16:59

switch to partitions #0, OK

mmc1(part 0) is current device

MMC read: dev # 1, block # 20480, count 4608 ... 4608 blocks read: OK

MMC read: dev # 1, block # 28672, count 2048 ... 2048 blocks read: OK

crc32+ 

fsl-mc: Booting Management Complex ... SUCCESS

fsl-mc: Management Complex booted (version: 10.20.4, boot status: 0x1)

Hit any key to stop autoboot:  0 

  • We use LSDK2012.
  • Yes, this issue happened to both TCP and UDP traffic.
  • Our Lx2160 HW board use 2.5 G PHY ethernet port, ipsec connection as below, 

                =>Lx2160 <------->Router<-------->Internet<--------> Security Gateway

  • IPsec conf as below:

# ipsec.conf - strongSwan IPsec configuration file

# basic configuration

config setup

# strictcrlpolicy=yes

# uniqueids = no

  strictcrlpolicy=no

conn %default

# automatically generated by tenpin

  keyingtries=0

  dpdaction=clear

  dpddelay=120s

  auto=start

  keyexchange=ikev2

  reauth=no

  mobike=no

  rekeymargin=1m

  ikelifetime=24h

  keylife=10h

  leftupdown=/usr/local/bin/gnb/ASK/ipsecScript/updown.tenpin

  replay_window=64

conn conn-1

  keyexchange=ikev2

  ike=aes128-sha1-modp1024

  esp=aes128-sha1

  right=211.xx.xx.xx

  leftid="xxxxxxxxxxxx.askey.com.tw" 

  rightid=%any

  leftcert=sc.crt.pem

  leftsendcert=if-asked

  authby=pubkey

  leftdns=%config

  rightsubnet="172.29.0.0/22"

  left=%defaultroute

  leftsourceip=%config

  leftdns=%config

  fragmentation=yes

  leftikeport=500

  rightikeport=500

  type=tunnel          

  • iperf3 test command as below:

=>iperf3 -c 172.29.0.x -t 60000 -R -i 10 -P8 -p 52xx

  • This issue is happened when GRO is enabled.

If you need more information, please let me know. ​

BR,

Jack

0 Kudos

371 Views
andrei_skok
NXP TechSupport
NXP TechSupport

Do you got anything on the RX error frame queue? Which should give you indications.
If the issue is indeed "port_rx_out_of_buffers_discard". You may want to check there is any memory leak on release/return buffers back to the buffer pool properly.
Depends on the code base and the interface(Gbe vs 10G), you can adjust the number of buffer pointers in the buffer pool and the buffers the ethernet driver pre-allocate. That assume there is no memory leak in the system.

0 Kudos

364 Views
jackho
Contributor II

Hi @andrei_skok

Yes, this issue happened in rx receive side.

1.We enable XFRM driver debug mode. below is the error count status,

root@localhost:~# cat /proc/net/xfrm_stat
XfrmInError 0
XfrmInBufferError 0
XfrmInHdrError 0
XfrmInNoStates 1
XfrmInStateProtoError 68056
XfrmInStateModeError 0
XfrmInStateSeqError 202837092
XfrmInStateExpired 0
XfrmInStateMismatch 0
XfrmInStateInvalid 0
XfrmInTmplMismatch 0
XfrmInNoPols 0
XfrmInPolBlock 0
XfrmInPolError 0
XfrmOutError 0
XfrmOutBundleGenError 0
XfrmOutBundleCheckError 0
XfrmOutNoStates 0
XfrmOutStateProtoError 0
XfrmOutStateModeError 0
XfrmOutStateSeqError 0
XfrmOutStateExpired 0
XfrmOutPolBlock 0
XfrmOutPolDead 0
XfrmOutPolError 0
XfrmFwdHdrError 0
XfrmOutStateInvalid 0
XfrmAcquireError 0

2.We use ftrace trace this issue, it seems the issue happened in GRO driver, in /net/core/gro_cells.c,

in gro_cells_receive function as below, normally skb_queue_len will get correct queue length, but when this issue happened, it seems skb_queue_len get the wrong length, the value is negative number(4294967274), the netdev_max_backlog value is set to 5000, so it will start drop packets, I think the problem may this_cpu_ptr get the wrong cell, so sub_queue_len get the wrong queue length, any suggestion about this issue?

int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
{
.....
cell = this_cpu_ptr(gcells->cells);

   if (skb_queue_len(&cell->napi_skbs) > netdev_max_backlog) {
drop:
    atomic_long_inc(&dev->rx_dropped);
    kfree_skb(skb);
    res = NET_RX_DROP;
    goto unlock;
  }
.......
}

Thank you.

Jack

 

0 Kudos