LX2160ARDB ipsec disconnection issue

jackho · ‎09-22-2021

Hi,

We are do ipsec testing in lx2160ardb and not using dpdk, test conditions as below:

Strongswan (5.8.4)

Linux Kernel(4.19)

When Ipsec tunnel created, we use iperf3 to test tunnel downlink performance

the data throughout is about 1.5 ~ 2Gbits/sec

After testing about 30 mins or 1hour, the time is random not fixed , it seems ipsec tunnel start drop packages and cannot send packages and receive packages anymore, we use ipsec status check the tunnel, the tunnel still there, but even using ping command, the ping packages didn't send out.

By the way, we also test it on EVB board(kernel version:5.4), it still has the same issue.

Any suggestion about this issue?

Thank you.

Jack

Lee_1 · ‎08-01-2023

Hi

Have you solved this problem? I have the same problem as you and want to know how to deal with it.

Thank you very much

lee

jackho · ‎10-12-2021

HI @andrei_skok,

Sorry for typo issue: LSDK we use 2004.

BR,

Jack

jackho · ‎10-12-2021

HI @andrei_skok

it seems I have no permission reply in case, so I reply here.

The version of MC as below:

U-Boot 2019.10 (Oct 12 2021 - 11:09:54 +0800)

SoC: unknown (0x87361020)

Clock Configuration:

CPU0(A72):2200 MHz CPU1(A72):2200 MHz CPU2(A72):2200 MHz

CPU3(A72):2200 MHz CPU4(A72):2200 MHz CPU5(A72):2200 MHz

CPU6(A72):2200 MHz CPU7(A72):2200 MHz CPU8(A72):2200 MHz

CPU9(A72):2200 MHz CPU10(A72):2200 MHz CPU11(A72):2200 MHz

CPU12(A72):2200 MHz CPU13(A72):2200 MHz CPU14(A72):2200 MHz

CPU15(A72):2200 MHz

Bus: 750 MHz DDR: 3200 MT/s

Reset Configuration Word (RCW):

00000000: 5883833c 24580058 00000000 00000000

00000010: 00000000 0c010000 00000000 00000000

00000020: 016001a0 00002580 00000000 1800164e

00000030: 08000000 00000000 00000000 00000000

00000040: 00000000 00000000 00000000 00000000

00000050: 00000000 00000000 00000000 00000000

00000060: 00000000 00000000 00027088 00000000

00000070: 00b30030 00150022

Model: NXP Layerscape LX2160ARDB Board

Board: unknown-RDB, Board version: @, boot from eMMC

FPGA: v0.0

SERDES1 Reference: Clock1 = 161.13MHz Clock2 = 161.13MHz

SERDES2 Reference: Clock1 = 100MHz Clock2 = 100MHz

SERDES3 Reference: Clock1 = 100MHz Clock2 = 100MHz

VID: failed to select VDD Page 0

VID: Couldn't read sensor abort VID adjustment

core voltage not adjusted

DRAM: 15.9 GiB

DDR 15.9 GiB (DDR4, 64-bit, CL=22, ECC off)

Using SERDES1 Protocol: 19 (0x13)

Using SERDES2 Protocol: 5 (0x5)

Using SERDES3 Protocol: 0 (0x0)

SERDES3[PRTCL] = 0x0 is not valid

MMC: FSL_SDHC: 0, FSL_SDHC: 1

Loading Environment from MMC... OK

EEPROM: Read failed.

In: serial_pl01x

Out: serial_pl01x

Err: serial_pl01x

Net: PCA: failed to select proper channel

Could not get PHY for FSL_MDIO0: addr 4

Failed to connect

Could not get PHY for FSL_MDIO0: addr 1

Failed to connect

Could not get PHY for FSL_MDIO0: addr 2

Failed to connect

PCIe0: pcie@3400000 disabled

PCIe1: pcie@3500000 disabled

PCIe2: pcie@3600000 Root Complex: no link

PCIe3: pcie@3700000 disabled

PCIe4: pcie@3800000 disabled

PCIe5: pcie@3900000 disabled

DPMAC3@xgmii

Warning: DPMAC3@xgmii (eth0) using random MAC address - ce:b2:70:e8:0e:71

, DPMAC4@xgmii

Warning: DPMAC4@xgmii (eth1) using random MAC address - b2:34:6b:97:5e:dc

, DPMAC5@25g-aui

Warning: DPMAC5@25g-aui (eth2) using random MAC address - de:c6:89:80:14:f6

, DPMAC6@25g-aui

Warning: DPMAC6@25g-aui (eth3) using random MAC address - be:33:ec:55:16:59

switch to partitions #0, OK

mmc1(part 0) is current device

MMC read: dev # 1, block # 20480, count 4608 ... 4608 blocks read: OK

MMC read: dev # 1, block # 28672, count 2048 ... 2048 blocks read: OK

crc32+

fsl-mc: Booting Management Complex ... SUCCESS

fsl-mc: Management Complex booted (version: 10.20.4, boot status: 0x1)

Hit any key to stop autoboot: 0

We use LSDK2012.
Yes, this issue happened to both TCP and UDP traffic.
Our Lx2160 HW board use 2.5 G PHY ethernet port, ipsec connection as below,

=>Lx2160 <------->Router<-------->Internet<--------> Security Gateway

IPsec conf as below:

# ipsec.conf - strongSwan IPsec configuration file

# basic configuration

config setup

# strictcrlpolicy=yes

# uniqueids = no

strictcrlpolicy=no

conn %default

# automatically generated by tenpin

keyingtries=0

dpdaction=clear

dpddelay=120s

auto=start

keyexchange=ikev2

reauth=no

mobike=no

rekeymargin=1m

ikelifetime=24h

keylife=10h

leftupdown=/usr/local/bin/gnb/ASK/ipsecScript/updown.tenpin

replay_window=64

conn conn-1

keyexchange=ikev2

ike=aes128-sha1-modp1024

esp=aes128-sha1

right=211.xx.xx.xx

leftid="xxxxxxxxxxxx.askey.com.tw"

rightid=%any

leftcert=sc.crt.pem

leftsendcert=if-asked

authby=pubkey

leftdns=%config

rightsubnet="172.29.0.0/22"

left=%defaultroute

leftsourceip=%config

leftdns=%config

fragmentation=yes

leftikeport=500

rightikeport=500

type=tunnel

iperf3 test command as below:

=>iperf3 -c 172.29.0.x -t 60000 -R -i 10 -P8 -p 52xx

This issue is happened when GRO is enabled.

If you need more information, please let me know.

BR,

Jack

andrei_skok · ‎10-05-2021

Do you got anything on the RX error frame queue? Which should give you indications.
If the issue is indeed "port_rx_out_of_buffers_discard". You may want to check there is any memory leak on release/return buffers back to the buffer pool properly.
Depends on the code base and the interface(Gbe vs 10G), you can adjust the number of buffer pointers in the buffer pool and the buffers the ethernet driver pre-allocate. That assume there is no memory leak in the system.

jackho · ‎10-05-2021

Hi @andrei_skok

Yes, this issue happened in rx receive side.

1.We enable XFRM driver debug mode. below is the error count status,

root@localhost:~# cat /proc/net/xfrm_stat
XfrmInError 0
XfrmInBufferError 0
XfrmInHdrError 0
XfrmInNoStates 1
XfrmInStateProtoError 68056
XfrmInStateModeError 0
XfrmInStateSeqError 202837092
XfrmInStateExpired 0
XfrmInStateMismatch 0
XfrmInStateInvalid 0
XfrmInTmplMismatch 0
XfrmInNoPols 0
XfrmInPolBlock 0
XfrmInPolError 0
XfrmOutError 0
XfrmOutBundleGenError 0
XfrmOutBundleCheckError 0
XfrmOutNoStates 0
XfrmOutStateProtoError 0
XfrmOutStateModeError 0
XfrmOutStateSeqError 0
XfrmOutStateExpired 0
XfrmOutPolBlock 0
XfrmOutPolDead 0
XfrmOutPolError 0
XfrmFwdHdrError 0
XfrmOutStateInvalid 0
XfrmAcquireError 0

2.We use ftrace trace this issue, it seems the issue happened in GRO driver, in /net/core/gro_cells.c,

in gro_cells_receive function as below, normally skb_queue_len will get correct queue length, but when this issue happened, it seems skb_queue_len get the wrong length, the value is negative number(4294967274), the netdev_max_backlog value is set to 5000, so it will start drop packets, I think the problem may this_cpu_ptr get the wrong cell, so sub_queue_len get the wrong queue length, any suggestion about this issue?

int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
{
.....
cell = this_cpu_ptr(gcells->cells);

if (skb_queue_len(&cell->napi_skbs) > netdev_max_backlog) {
drop:
atomic_long_inc(&dev->rx_dropped);
kfree_skb(skb);
res = NET_RX_DROP;
goto unlock;
}
.......
}

Thank you.

Jack