Packet reordering on LS1043ARDB

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

Packet reordering on LS1043ARDB

2,086 次查看
thorstenhorstma
Contributor I

Hi,

we observe massive Ethernet packet reordering on our LS1043 equipped custom board which runs a SDK 2.0 based Linux. To verify this issue is not limited to our design / configuration I have tested this on the LS1043ARDB reference system where I can confirm the same behavior. It can be easily reproduced for example by sending some ICMP packets back-to-back with ping:

Host (192.168.137.238): 

$ ping -f -c20 -l20 192.168.137.243

LS1043ARDB (192.168.137.243): 

tcpdump icmp | grep request
[ 2765.909316] device fm1-mac9 entered promiscuous mode
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on fm1-mac9, link-type EN10MB (Ethernet), capture size 262144 bytes
06:43:42.533445 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 1, length 64
06:43:42.533447 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 2, length 64
06:43:42.533452 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 3, length 64
06:43:42.533457 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 4, length 64
06:43:42.533471 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 5, length 64
06:43:42.533471 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 8, length 64
06:43:42.533480 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 9, length 64
06:43:42.533481 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 12, length 64
06:43:42.533488 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 13, length 64
06:43:42.533489 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 16, length 64
06:43:42.533496 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 17, length 64
06:43:42.533498 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 18, length 64
06:43:42.533500 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 6, length 64
06:43:42.533504 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 7, length 64
06:43:42.533505 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 19, length 64
06:43:42.533509 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 10, length 64
06:43:42.533517 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 14, length 64
06:43:42.533518 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 11, length 64
06:43:42.533526 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 15, length 64
06:43:42.533527 IP 192.168.137.238 > 192.168.137.243: ICMP echo request, id 6550, seq 20, length 64

As you can see the ICMP request are received out of order. I'm wondering why this happens. I guess it is related to the DPAA queues, but should not the hash algorithm place the packets from one flow in the same queue in this case?

I can reproduce this issue with different protocols like raw Ethernet or UDP as well. I understand that none of the protocols guarantees the ordering of the packets. But I'm surprised that this is the default behavior since for most known protocol algorithms these kind of reordering will be handled like packet loss.

Can someone help me to understand the root cause of this issue? Is there a way to prevent this reordering?

Thank you in advance,

  -Thorsten

标记 (1)
0 项奖励
6 回复数

1,416 次查看
yipingwang
NXP TechSupport
NXP TechSupport

Unlike TCP, UDP is an unreliable high level protocol, we also encountered out-of-order issue in iperf UDP testing on LS1043ARDB.
You could use core-affined queue to implement order order preservation, please use one of the following default FMC policy files.
/etc/fmc/config/private/ls1043ardb/RR_FQPP_1455/policy_ipv4.xml
/etc/fmc/config/private/ls1043ardb/RR_FQPP_1455/policy_ipv6.xml

Please execute FMC commands as the following before all networking related operation.
# fmc -c /etc/fmc/config/private/ls1043ardb/RR_FQPP_1455/config.xml -p
/etc/fmc/config/private/ls1043ardb/RR_FQPP_1455/policy_ipv4.xml -a

This is because in DPAA1, each interface used by default one pool channel
across all software portals and also the dedicated channel of each CPU. In
Linux Kernel, PCD Frame queues in use dedicated channels.You could refer to
the section "5. Dedicated and Pool Channels Usage in Linux Kernel" in
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommunity
.nxp.com%2Fdocs%2FDOC-329916&data=02%7C01%7Ctech.support%40nxp.com%7Cad3ec3502cc849493def08d5f
76ab1a1%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636686959995990555&
sdata=rcGwOiK4On8o9UmHqAqgnCiADxtT%2BkDTpiAFnwSiV88%3D&reserved=0 for
details.


You need to use multiple flows, after executing FMC policy, one flow will bind to one core, so all 4 cores will be used by multiple flows. In the real scenario, one user application uses one flow.


Have a great day,
TIC

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 项奖励

1,416 次查看
bpe
NXP Employee
NXP Employee

What you are observing is as expected. By default, packets are distributed
among queues served by different cores for better load balancing, so
egress packet order is not preserved. However, you don't have to disable
SMP or send all packets to a single queue to preserve packet order.
It is sufficient to identify flows in the ingress traffic and send
each flow to a single frame queue. A more detailed discussion can
found here:

https://freescale.sdlproducts.com/LiveContent/content/en-US/QorIQ_SDK/GUID-FD20D104-969A-41D8-9BF3-3...

QorIQ SDK offers a ready configuration that follows the second
approach discussed in the document above, namely hash-distributes
packets among core-affined queues. To enable it, run FMC tool with
the default -c config.xml and -p policy_IPv4.xml policy files.

More information on FMC tool can be found at this link:

https://freescale.sdlproducts.com/LiveContent/content/en-US/QorIQ_SDK/GUID-B58EBCD5-3559-4D6D-AF31-A...


Have a great day,
Platon

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 项奖励

1,416 次查看
adeel
Contributor III

Hi Thorsten,

I think this is normal behaviour for DPAA. Because Ethernet queues are processed by multiple CPUs at the same time. If a CPU haven't finished processing a frame and the other one does then you will see reordering. Quick test to prove this theory would be restrict multiple cores in Linux and see if reordering doesn't occur.

Best regards,

Adeel

0 项奖励

1,416 次查看
aliimran
Contributor I

Adeel,

We are also running into this issue. Disabling SMP on Linux works around this issue. The problem is the throughput drops through the floor. 

The problem only happens at high data rates: there is no reordering if I force the 10 GbE port to run at 1 GbE. 

With the reordering, TCP/IP performance drops down to ~200-300 Mbit/s because TCP treats reordering as packet loss, and re-transmits the data.

Is there really no fix for this? How should I run TCP/IP over the 10 GbE port?

0 项奖励

1,416 次查看
adeel
Contributor III

I think on a 10g link a performance drop of 300mbps is not that bad. You could try to configure DPAA to skip parallel processing of packets but it will also impact performance.

Perhaps DPAA already has packet reordering feature since I remember it can parse packets. Maybe it can reorder and then give to TCP/IP? 

0 项奖励

1,417 次查看
aliimran
Contributor I

You misunderstand, the throughput drops to a total of 300 MBps. So instead of getting 10,000 MBps, I get 300 MBps. 

0 项奖励