Hi,
We are do ipsec testing in lx2160ardb and not using dpdk, test conditions as below:
Strongswan (5.8.4)
Linux Kernel(4.19)
When Ipsec tunnel created, we use iperf3 to test tunnel downlink performance
the data throughout is about 1.5 ~ 2Gbits/sec
After testing about 30 mins or 1hour, the time is random not fixed , it seems ipsec tunnel start drop packages and cannot send packages and receive packages anymore, we use ipsec status check the tunnel, the tunnel still there, but even using ping command, the ping packages didn't send out.
By the way, we also test it on EVB board(kernel version:5.4), it still has the same issue.
Any suggestion about this issue?
Thank you.
Jack
Hi
Have you solved this problem? I have the same problem as you and want to know how to deal with it.
Thank you very much
lee
HI @andrei_skok
it seems I have no permission reply in case, so I reply here.
U-Boot 2019.10 (Oct 12 2021 - 11:09:54 +0800)
SoC: unknown (0x87361020)
Clock Configuration:
CPU0(A72):2200 MHz CPU1(A72):2200 MHz CPU2(A72):2200 MHz
CPU3(A72):2200 MHz CPU4(A72):2200 MHz CPU5(A72):2200 MHz
CPU6(A72):2200 MHz CPU7(A72):2200 MHz CPU8(A72):2200 MHz
CPU9(A72):2200 MHz CPU10(A72):2200 MHz CPU11(A72):2200 MHz
CPU12(A72):2200 MHz CPU13(A72):2200 MHz CPU14(A72):2200 MHz
CPU15(A72):2200 MHz
Bus: 750 MHz DDR: 3200 MT/s
Reset Configuration Word (RCW):
00000000: 5883833c 24580058 00000000 00000000
00000010: 00000000 0c010000 00000000 00000000
00000020: 016001a0 00002580 00000000 1800164e
00000030: 08000000 00000000 00000000 00000000
00000040: 00000000 00000000 00000000 00000000
00000050: 00000000 00000000 00000000 00000000
00000060: 00000000 00000000 00027088 00000000
00000070: 00b30030 00150022
Model: NXP Layerscape LX2160ARDB Board
Board: unknown-RDB, Board version: @, boot from eMMC
FPGA: v0.0
SERDES1 Reference: Clock1 = 161.13MHz Clock2 = 161.13MHz
SERDES2 Reference: Clock1 = 100MHz Clock2 = 100MHz
SERDES3 Reference: Clock1 = 100MHz Clock2 = 100MHz
VID: failed to select VDD Page 0
VID: Couldn't read sensor abort VID adjustment
core voltage not adjusted
DRAM: 15.9 GiB
DDR 15.9 GiB (DDR4, 64-bit, CL=22, ECC off)
Using SERDES1 Protocol: 19 (0x13)
Using SERDES2 Protocol: 5 (0x5)
Using SERDES3 Protocol: 0 (0x0)
SERDES3[PRTCL] = 0x0 is not valid
MMC: FSL_SDHC: 0, FSL_SDHC: 1
Loading Environment from MMC... OK
EEPROM: Read failed.
In: serial_pl01x
Out: serial_pl01x
Err: serial_pl01x
Net: PCA: failed to select proper channel
Could not get PHY for FSL_MDIO0: addr 4
Failed to connect
Could not get PHY for FSL_MDIO0: addr 1
Failed to connect
Could not get PHY for FSL_MDIO0: addr 2
Failed to connect
PCIe0: pcie@3400000 disabled
PCIe1: pcie@3500000 disabled
PCIe2: pcie@3600000 Root Complex: no link
PCIe3: pcie@3700000 disabled
PCIe4: pcie@3800000 disabled
PCIe5: pcie@3900000 disabled
DPMAC3@xgmii
Warning: DPMAC3@xgmii (eth0) using random MAC address - ce:b2:70:e8:0e:71
, DPMAC4@xgmii
Warning: DPMAC4@xgmii (eth1) using random MAC address - b2:34:6b:97:5e:dc
, DPMAC5@25g-aui
Warning: DPMAC5@25g-aui (eth2) using random MAC address - de:c6:89:80:14:f6
, DPMAC6@25g-aui
Warning: DPMAC6@25g-aui (eth3) using random MAC address - be:33:ec:55:16:59
switch to partitions #0, OK
mmc1(part 0) is current device
MMC read: dev # 1, block # 20480, count 4608 ... 4608 blocks read: OK
MMC read: dev # 1, block # 28672, count 2048 ... 2048 blocks read: OK
crc32+
fsl-mc: Booting Management Complex ... SUCCESS
fsl-mc: Management Complex booted (version: 10.20.4, boot status: 0x1)
Hit any key to stop autoboot: 0
=>Lx2160 <------->Router<-------->Internet<--------> Security Gateway
# ipsec.conf - strongSwan IPsec configuration file
# basic configuration
config setup
# strictcrlpolicy=yes
# uniqueids = no
strictcrlpolicy=no
conn %default
# automatically generated by tenpin
keyingtries=0
dpdaction=clear
dpddelay=120s
auto=start
keyexchange=ikev2
reauth=no
mobike=no
rekeymargin=1m
ikelifetime=24h
keylife=10h
leftupdown=/usr/local/bin/gnb/ASK/ipsecScript/updown.tenpin
replay_window=64
conn conn-1
keyexchange=ikev2
ike=aes128-sha1-modp1024
esp=aes128-sha1
right=211.xx.xx.xx
leftid="xxxxxxxxxxxx.askey.com.tw"
rightid=%any
leftcert=sc.crt.pem
leftsendcert=if-asked
authby=pubkey
leftdns=%config
rightsubnet="172.29.0.0/22"
left=%defaultroute
leftsourceip=%config
leftdns=%config
fragmentation=yes
leftikeport=500
rightikeport=500
type=tunnel
=>iperf3 -c 172.29.0.x -t 60000 -R -i 10 -P8 -p 52xx
If you need more information, please let me know.
BR,
Jack
Do you got anything on the RX error frame queue? Which should give you indications.
If the issue is indeed "port_rx_out_of_buffers_discard". You may want to check there is any memory leak on release/return buffers back to the buffer pool properly.
Depends on the code base and the interface(Gbe vs 10G), you can adjust the number of buffer pointers in the buffer pool and the buffers the ethernet driver pre-allocate. That assume there is no memory leak in the system.
Hi @andrei_skok
Yes, this issue happened in rx receive side.
1.We enable XFRM driver debug mode. below is the error count status,
root@localhost:~# cat /proc/net/xfrm_stat
XfrmInError 0
XfrmInBufferError 0
XfrmInHdrError 0
XfrmInNoStates 1
XfrmInStateProtoError 68056
XfrmInStateModeError 0
XfrmInStateSeqError 202837092
XfrmInStateExpired 0
XfrmInStateMismatch 0
XfrmInStateInvalid 0
XfrmInTmplMismatch 0
XfrmInNoPols 0
XfrmInPolBlock 0
XfrmInPolError 0
XfrmOutError 0
XfrmOutBundleGenError 0
XfrmOutBundleCheckError 0
XfrmOutNoStates 0
XfrmOutStateProtoError 0
XfrmOutStateModeError 0
XfrmOutStateSeqError 0
XfrmOutStateExpired 0
XfrmOutPolBlock 0
XfrmOutPolDead 0
XfrmOutPolError 0
XfrmFwdHdrError 0
XfrmOutStateInvalid 0
XfrmAcquireError 0
2.We use ftrace trace this issue, it seems the issue happened in GRO driver, in /net/core/gro_cells.c,
in gro_cells_receive function as below, normally skb_queue_len will get correct queue length, but when this issue happened, it seems skb_queue_len get the wrong length, the value is negative number(4294967274), the netdev_max_backlog value is set to 5000, so it will start drop packets, I think the problem may this_cpu_ptr get the wrong cell, so sub_queue_len get the wrong queue length, any suggestion about this issue?
int gro_cells_receive(struct gro_cells *gcells, struct sk_buff *skb)
{
.....
cell = this_cpu_ptr(gcells->cells);
if (skb_queue_len(&cell->napi_skbs) > netdev_max_backlog) {
drop:
atomic_long_inc(&dev->rx_dropped);
kfree_skb(skb);
res = NET_RX_DROP;
goto unlock;
}
.......
}
Thank you.
Jack