Hi All.
I'm developing a userspace DPDK application using LSDK 19.09, running on LS1046A, and using pktgen-dpdk running on a PC to generate test traffic. I count the received packets in my code, using the value returned by rte_eth_rx_burst(), and also read the stats from DPDK/DPAA.
In packets per second, the DPAA rx stats approximately matches the pktgen-dpdk tx rate (within ~0.1% with some second-second variation). But the packet count from rte_eth_rx_burst() is consistently around 10% lower. This happens for data rates ~80Mb/s, 300Mb/s and 700Mb/s. My application forwards all packets, and the pktgen receive count matches my transmit count, which matches my rte_eth_rx_burst() count. I can't figure out where my packets are going.
The count of available buffers returned by rte_mempool_avail_count() stays reasonably constant, so I'm not leaking buffers.
None of the DPDK/DPAA stats rte_eth_stats_get() or rte_eth_xstats_get() show any packet errors. (e.g. missed, mbuf allocation, fcs, undersized ....). The port is running in promiscuous mode.
Packets are normal size with no mbuf chaining. A cumulative count of mbuf.nb_segs matches my code packet count.
There is one place in my code where I receive packets:
nb_rxd = rte_eth_rx_burst(port, 0, app.mbuf_rx.array, n_toread);
port_statistics[port].rx += nb_rxd;
Could rte_eth_tx_burst() be reading more than I ask for? And/or returning an incorrect count? Any other ways I could be losing packets?
I have one of my 4 1GbE ports setup for linux, and 3 ports available in userspace. I wonder if some of my packets could be leaking into the linux driver? They don't show up in the stats from ifconfig.
Cheers,
Mark
Edit: The DPDK example app l2fwd also exhibits about a 10% packet loss in my test setup. It does not, unfortunately, show the DPDK/DPAA collected stats.
I am using LSDK 19.09 update 311219. It does not have the patch. Thank you, this sounds promising. I'll be able to run tests later today.
Cheers,
Mark
Thanks for the patch. It applied OK. But the packet loss behaviour remains.
I'll try LSDK_2004
flexbuilder fails to build DPDK in LSDK2004 (I have raised a support case).
Are there any other patches or known issues that might be applicable to my original packet loss problem? Should I raise a support case?
I need to discuss this problem with AE team.
Same problem with LSDK2004 - exactly as per my original post. Should I raise a support issue?
I want clarify the issue that seems to happen here:
You use a PC that has pktgen running; the pktgen will generate a number of packets-be it N that are injected into a ls1046 port controlled by your DPDK application.
You observed, based on the counters from rte_eth_rx_burst , that on the Rx side of DPDK application has received aprox 10% less than number N sent from PC, be it M. Your application will fwd the rte_eth_rx_burst packets, the number of transmitted packets from the application is equal with the number of rte_eth_rx_burst packets which is M.
Is this true?
As a general rule the simplified flow for a packet in ls1046:
--->MEMAC--->FMAN RX port--->Frame queue--->cpu
You could try from a separate console:
-check if the frames are reaching the FMAN RX port. This can be achieved by running this command:
find /sys/devices/ -name 'port_frame' -exec cat {} \;
FM Port not configured...
fm0-port-rx2 counter: 0
fm0-port-tx3 counter: 0
FM Port not configured...
fm0-port-rx6 counter: 0
fm0-port-tx7 counter: 0
fm0-port-rx4 counter: 4205723
fm0-port-tx5 counter: 0
FM Port not configured...
FM Port not configured...
FM Port not configured...
FM Port not configured...
fm0-port-tx2 counter: 0
fm0-port-tx6 counter: 0
fm0-port-rx3 counter: 0
fm0-port-tx4 counter: 647
fm0-port-oh1 counter: 0
fm0-port-rx7 counter: 0
fm0-port-rx5 counter: 0
FM Port not configured...
FM Port not configured...
FM Port not configured...
-check if the frames reach the MAC.
Example:
-suppose that the mac in discussion is : e8000 -you can take the offset either from RM or from dts.
-compute absolute address : e8000 + 1a00000 = 1ae8000.
-read memac counters from 0x120 -0x124
./iomem r32:4 0x1ae8120
Hi Yiping Wang,
You use a PC that has pktgen running; the pktgen will generate a number of packets-be it N that are injected into a ls1046 port controlled by your DPDK application.
You observed, based on the counters from rte_eth_rx_burst , that on the Rx side of DPDK application has received aprox 10% less than number N sent from PC, be it M. Your application will fwd the rte_eth_rx_burst packets, the number of transmitted packets from the application is equal with the number of rte_eth_rx_burst packets which is M.
Is this true?
Yes. The app is forwarding all (M) packets received from rte_eth_rx_burst.
The DPAA stats read with rte_eth_stats_get() show N packets received.
I have fm1-mac1 for Linux use.
My dpdk ports in use are
port 0 fm1-mac5 0xe8000 ... :2d:8f
port 1 fm1-mac6 0xea000 ... :2d:90
fman counts before the run:
# find /sys/devices/ -name 'port_frame' -exec cat {} \;
fm0-port-rx5 counter: 0
fm0-port-tx4 counter: 0
FM Port not configured...
FM Port not configured...
fm0-port-tx0 counter: 1365
FM Port not configured...
FM Port not configured...
FM Port not configured...
FM Port not configured...
fm0-port-tx7 counter: 0
fm0-port-rx4 counter: 47279872
FM Port not configured...
fm0-port-rx0 counter: 6800
fm0-port-tx5 counter: 43045909
fm0-port-oh1 counter: 0
FM Port not configured...
fm0-port-rx7 counter: 0
FM Port not configured...
FM Port not configured...
FM Port not configured...
FM Port not configured...
FM Port not configured...
root@localhost:~#
pktgen (on PC) sends 5,398,464 pkts, receives 4,914,070 pkts
from my DPDK app:
total packets sent 4,914,412
I don't currently have a cumulative packet count for the DPAA
stats or packet receive. I do have a derived "packets per second"
which shows receive from rte_eth_rx_burst at around 10% less than
the DPAA receive rate.
fman counts after the run:
~# find /sys/devices/ -name 'port_frame' -exec cat {} \;
fm0-port-rx5 counter: 0
fm0-port-tx4 counter: 0
FM Port not configured...
FM Port not configured...
fm0-port-tx0 counter: 2218
FM Port not configured...
FM Port not configured...
FM Port not configured...
FM Port not configured...
fm0-port-tx7 counter: 0
fm0-port-rx4 counter: 52678336
FM Port not configured...
fm0-port-rx0 counter: 7651
fm0-port-tx5 counter: 47960321
fm0-port-oh1 counter: 0
FM Port not configured...
fm0-port-rx7 counter: 0
FM Port not configured...
FM Port not configured...
FM Port not configured...
FM Port not configured...
FM Port not configured...
root@localhost:~#
fman count differences:
fm0-port-rx4 5,398,464
fm0-port-tx5 4,914,412
So it seems the frames are being counted by the DPAA stats
that are made available by the DPDK function rte_eth_stats_get(),
but are not reaching the FMAN Rx Port?
Edit (MarkC): I just realised that the fman count _does_ match the DPAA stats.
But the frames are not reaching rte_eth_rx_burst() in DPDK/userspace
(Can you give me some more pointers on how to read
memory for the memac counters? Can this be done from
the command line? I have tried the app "devmem2" but
only get zeroes:
root@localhost:~# devmem2 0x1ae8120 w
/dev/mem opened.
Memory mapped at address 0xffff97cbc000.
Value at address 0x1AE8120 (0xffff97cbc120): 0x0
root@localhost:~# devmem2 0x1aa8120 w
/dev/mem opened.
Memory mapped at address 0xffff960b5000.
Value at address 0x1AA8120 (0xffff960b5120): 0x0
root@localhost:~#
Cheers,
Mark
As a general comment you should rely on FMAN counters (MAC and FMAN RX port then FMAN TX port) then you should check if your app has the exact num of frames as the ones received on FMAN.
And if you do not have a counter in your app then you should debug based on the below steps.
The main point is to see if what_was_sent_from_pc == FMAN RX and debug if this is not true.
The counter that shows the FMAN RX it's this one:
fm0-port-rx4 counter: 52678336
To see the mac counters do as follows:
find / -name "mac_rx_stats"
Rerun with traffic(try to send from the PC for instance 100000 frames -and then you check the other side -for simplicity not to use various numbers with various decimals) and check the rx4 and the mac4; they should be equa. (Number of frames that reached in the mac4 must be equal with fm0-port-rx4)
A complete packet flow in hardware is(in your case PCD is does not exist; ignore it):
Rx MAC -> Port BMI -> PCD -> BMI ->QMI
if the number of frames in mac4 != fm0-port-rx4 then you must check port BMI port counters:
-in the dts check the base address of fman rx port that corresponds to mac 0xe8000:
it must be :
port@8c000 {
cell-index = <0xc>;
compatible = "fsl,fman-v3-port-rx", "fsl,fman-port-1g-rx";
reg = <0x8c000 0x1000>;
phandle = <0x2e>;
};
-in sysfs go to:
/sys/bus/platform/devices/soc/1a00000.fman/1a8c000.port/statistics/port_rx_out_of_buffers_discard
-check this counter. For example if the buffer pool of the port is depleted then frames will not reach the fm0-port-rx4 and port_rx_out_of_buffers_discard is incremented ; in this case mac4 != fm0-port-rx4
If everything is ok in the BMI it means that the frames should be enqueued by FMAN in the frame queue RX.(this goes through QMI) You can check QMI /sys/bus/platform/devices/soc/1a00000.fman/1a8c000.port/fm_port_qmi_regs
and here you check fmqm_pnetfc (Counts the total number of the enqueue operation which the QMI performed for a specific port)
If everything is ok after the above steps ...that means mac4 == fm0-port-rx4 then you should check if
mac4 == fm0-port-rx4 == what_was_sent_from_pc; if not, that means (mac4 == fm0-port-rx4) != what_was_sent_from_pc which suggests that either the pc is not reporting correctly or the pc reports correctly but some frames are discarded possible by mac4.
If everything is ok after the above step then you should check what happens
at app level because the number of frames seen on rx4 should be the num of frames that reach cpu.
I will stop at this point; to summarize, we need to see where is the issue in this sequence:
Rx MAC -> Port BMI -> BMI ->QMI ->CPU
Note: the only counters you should rely on are the HW counters and the ones retrieved by your application.
I do not want to rely on DPDK dpaa stats in the debug session.
Hi Yiping Wang,
I had previously edited my post May13 10:32pm. I initially
misinterpreted the counts. In fact, fm0-port-rx4 has the same count as
the pktgen traffic generator. It's just that the frames are not being
returned/counted by rte_eth_rx_burst()
i.e. for pktgen tx -> wire -> Rx MAC -> FMAN Rx -> Frame Queue -> cpu
(rte_eth_rx_burst())
pktgen tx == fm0-port-rx4 != cpu (rte_eth_rx_burst())
So I presume I don't need to look at the mac counter? Addressing your
suggestions in your post:
my port 0 is fm1-mac5 0xe8000, which has its base address in
qoriq-fman3-0-1g-4.dtsi
I have attached this file as in my build, unchanged from LSDK2004. The
address matches your post, but there is no
phandle = <0x2e>;
entry in the structure.
I have attached some files with captured counts from another test run.
I think
cat
/sys/devices/platform/soc/soc:fsl,dpaa/soc:fsl,dpaa:ethernet@0/net/fm1-mac1/mac_rx_stats
is giving me stats for the port used by linux.
I couldn't find stats for my userspace receiver port. There is no "net"
directory under
/sys/devices/platform/soc/soc:fsl,dpaa/soc:fsl,dpaa:ethernet@4
But perhaps there is enough info in the attached files:
pre-run_DPAA_count_may15.txt the various counters you suggested,
before the test run
pktgen_summary_may15.txt captured output from pktgen after the run
hyprfire_may15.txt printed stats from my app.
Line 4 is the interesting one.
post-run_DPAA_count_may15.txt the various counters you suggested,
after the test run
( Note: 1,000,000 = 0xf4240, 911,886 = 0xdea0e)
In summary:
pktgen sends 1,000,000 frames
fm0-port-rx4 counter: 1000000
port_rx_out_of_buffers_discard: 0
from fm_port_qmi_regs:
0xFFFFC38C2AD4D420: 0x000dea0e fmqm_pnetfc
(0xdea0e = 911,886)
my app (hyprfire) receives 911,886 frames from rte_eth_rx_burst()
the fmqm_pnetfc is interesting.
Cheers,
Mark
Keeping the same number of frames 1,000,000 can you check :
cat /sys/bus/platform/devices/soc/1a00000.fman/1a8c000.port/statistics/*
Let's see what are the values for each statistics; these statistics contain the qmi and bmi regs.
We need to trace where are the missing frames.
The statistics contain:
port_dealloc_buf port_enq_total port_rx_bad_frame port_rx_large_frame
port_discard_frame port_frame port_rx_filter_frame port_rx_out_of_buffers_discard
And the mapping in HW:
BMI:
rfrc (Number of frames received on the Rx port) rbfc (Bad Frames Counter - Error cause could be bad CRC, MAC FIFO overflow, coding error, etc) rfdc (Frames Discard Counter - were not able to enter the receive queue system due to WRED algorithm. Other reasons for enqueue reject may be tail drop, out of service FQ, etc) rodc (Out of Buffers Discard Counter - Number of received frames that were discarded due to lack of external buffers.) rfldec (Frames List DMA Error Counter- Number of received frames that were discarded due to WRED algorithm, and not able to release their buffers due to DMA error on the scatter/gather list read.) RFFC -number of frames received on the Rx port that were filtered out by the parse and classify modules of the FMan.
rfdc -port discard frame
RBDC—Rx Buffers Deallocate Counter
QMI:
pnetfc
These QMI /BMI are mapped with statistics and found in bmi/qmi_port_regs.
I think it's enough to dump the statistics.
Hi Yiping Wang,
Thank you for your suggestions re stats to look at. I think I'm making
real progress. The missing packets are counted with RFFC - number of
rames filtered out by parse & classify in FMan. My post run stats
attached. 1000000 frames received by MAC, 88114 filtered by Fman,
911886 received in my code from rte_eth_rx_burst(). The numbers add up
=> happiness! (at least partial)
I had set the port to promiscuous, using "rte_eth_promiscuous_enable()"
Can you suggest how I might find the reason for the filtering?
Can you point me at other settings I need to make to force promiscuous
receive? Is there something available in the DPDK API?
Cheers,
Mark
The filter counter is not related to promisc. From the reference manual:
The FMBM_RFFC register, counts the number of frames received on the Rx port that were filtered out by the parse and classify modules of the FMan.
Those frames are discarded and not shown to receive queues, unless FMBM_RCFG[FDOVR] is set, in which case the frames are enqueued on the queue configured in FMBM_REFQID[EFQID].
There are multple situations that could trigger this counter and the situations are specified by the FMBM_RFSDM:
Rx Frame Status Discard Mask Register
For any bit set to ‘1’ in the frame status word, if the corresponding bit in FMBM_RFSDM is set, the frame is discarded. In addition, FMBM_RFFC counter is incremented. Note that if FMBM_RCFG[FDOVR] is set and the corresponding bit in the FMBM_RFSEM is set, the frame is enqueued to EFQID. If any of the events described above does not occur, the frame can continue processing and thus can be enqueued to FQID as selected by the classification process
Can you identify those frames that are filtered to be able to send only them?
(It seems that the filtering is applied for some particular frames that generate errors when they traverse the PCD) Also - do you know if any PCD is applied? for instance any hashing or classification?
You can check if for instance D from PCD is applied (D means distribution which from |FMAN HW perspective this is KeyGen -key generation used for hashing and classification):
cat /sys/devices/platform/soc/1a00000.fman/fm_kg_regs
-if the value starts with 8 that means that D is applied.
My initialisation code is based on that for the DPDK example app l2fwd. I don't think it attempts to set up any parse/classify/distribute configuration. l2fwd exhibits the same frame loss as my app. I will look more closely into port initialisation. From my fm_kg_regs below, I note that KeyGen is enabled:
FmPcdKgRegs Regs (0xFFFFA8CFFCFC1000)
----------------------------------------
0xFFFFA8CFFCFC1000: 0x80000028 fmkg_gcr
0xFFFFA8CFFCFC100C: 0x00000000 fmkg_eer
0xFFFFA8CFFCFC1010: 0xc0000000 fmkg_eeer
0xFFFFA8CFFCFC101C: 0x00000000 fmkg_seer
0xFFFFA8CFFCFC1020: 0x00000000 fmkg_seeer
0xFFFFA8CFFCFC1024: 0x00000000 fmkg_gsr
0xFFFFA8CFFCFC1028: 0x01d6cbee fmkg_tpc
0xFFFFA8CFFCFC102C: 0x00000000 fmkg_serc
0xFFFFA8CFFCFC1040: 0x00000000 fmkg_fdor
0xFFFFA8CFFCFC1044: 0x00000000 fmkg_gdv0r
0xFFFFA8CFFCFC1048: 0x00000000 fmkg_gdv1r
0xFFFFA8CFFCFC1064: 0x00000000 fmkg_feer
0xFFFFA8CFFCFC11FC: 0x02008011 fmkg_ar
-------
And from the RM:
When EN = 1, the KeyGen is active. It gets jobs, processes packets, and generates results. When EN is cleared, the KeyGen performs a graceful disable sequence. It stops accepting new packets for processing the bus and runs existing packets to completion. During that sequence, the BSY bit can still be asserted. When there are no more packets in progress, the BSY bit is cleared. |
If Keygen is not enabled, will frames still flow?
I don't yet know where in DPDK that the FM is configured.
I'll start looking, but any pointers you could give me would be most appreciated. My intent is that all frames should be received. My hope was that any errored frames would be counted by the DPDK stats or xstats.
My generated traffic is a captured packet stream. I don't have any easy way of identifying the problem frames.
Cheers,
Mark
I'll check my generated traffic source (a pcap file) with wireshark
My pcap file (captured by a colleague) had a capture limit of 64byte!!!! Based on a count of filtered vs received packets, it seems the missing packets are those that were truncated in the original capture, so that the frame length indicated in the header does not match the actual number of bytes on the wire. I would have hoped there would have been a stats counter in xstats that would show these?
The stat name list returned by rte_eth_xstats_names() is:
rx_good_packets, tx_good_packets, rx_good_bytes, tx_good_bytes, rx_missed_errors, rx_errors, tx_errors, rx_mbuf_allocation_errors, rx_q0packets, rx_q0bytes, rx_q0errors, tx_q0packets, tx_q0bytes, rx_align_err, rx_valid_pause, rx_fcs_err, rx_vlan_frame, rx_frame_err, rx_drop_err, rx_undersized, rx_oversize_err, rx_fragment_pkt, tx_valid_pause, tx_fcs_err, tx_vlan_frame, rx_undersized
My xstats counts are
xstats: 1000000 0 64367208 0 0 0 0 0 0 0 0 0 0 0 0 0 15418 0 0 0 0 0 0 0 0 0
which only show counts for rx good packets & bytes, and rx_vlan_frame.
Sad, because these stats would have helped me greatly.
I have replayed an alternative captured stream, and again get a large number of filtered packets. Is there a (DPDK accessible?) way of disabling any such filtering?
I will make some changes to the rfsdm register to inhibit filtering on various errors. I note this register is written with a hard-coded value in dpdk/drivers/bus/dpaa/base/fman/fman_hw.c
Is your problem solved or you are still facing issues?
Currently DPDK based DPAA pmd driver by default programs the FMAN to drop all the error packets. However it can be changed and you get get packets in userspace by initializing the errors queues. Look for RTE_LIBRTE_DPAA_DEBUG_DRIVER flag. Currently the code is only for initializing the error queues but not for polling from them.
My immediate problem is basically solved - I modified the rfsdm register so that errored packets were not discarded, but delivered on the main queue.
But I would ideally like to see (sometime in the future? next release?) a count of the discarded errored packets in the DPDK accessible extended stats. This would have saved me a lot of time.
Cheers,
Mark