Why is my DPDK rx packet count different from DPAA stats?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Why is my DPDK rx packet count different from DPAA stats?

5,778 Views
mark_callaghan
Contributor III

Hi All.

I'm developing a userspace DPDK application using LSDK 19.09, running on LS1046A, and using pktgen-dpdk running on a PC to generate test traffic. I count the received packets in my code, using the value returned by rte_eth_rx_burst(), and also read the stats from DPDK/DPAA.

In packets per second, the DPAA rx stats approximately matches the pktgen-dpdk tx rate (within ~0.1% with some second-second variation). But the packet count from rte_eth_rx_burst() is consistently  around 10% lower. This happens for data rates ~80Mb/s, 300Mb/s and 700Mb/s. My application forwards all packets, and the pktgen receive count matches my transmit count, which matches my rte_eth_rx_burst() count. I can't figure out where my packets are going.

The count of available buffers returned by rte_mempool_avail_count() stays reasonably constant, so I'm not leaking buffers.

None of the DPDK/DPAA stats rte_eth_stats_get() or rte_eth_xstats_get() show any packet errors. (e.g. missed, mbuf allocation, fcs, undersized ....). The port is running in promiscuous mode.

Packets are normal size with no mbuf chaining. A cumulative count of mbuf.nb_segs matches my code packet count.

There is one place in my code where I receive packets:

        nb_rxd = rte_eth_rx_burst(port, 0, app.mbuf_rx.array, n_toread);

        port_statistics[port].rx += nb_rxd;

Could rte_eth_tx_burst() be reading more than I ask for? And/or returning an incorrect count? Any other ways I could be losing packets?

I have one of my 4 1GbE ports setup for linux, and 3 ports available in userspace. I wonder if some of my packets could be leaking into the linux driver? They don't show up in the stats from ifconfig.

Cheers,

Mark

Edit: The DPDK example app l2fwd also exhibits about a 10% packet loss in my test setup. It does not, unfortunately, show the DPDK/DPAA collected stats.

0 Kudos
25 Replies

4,548 Views
mark_callaghan
Contributor III

I am using LSDK 19.09 update 311219. It does not have the patch. Thank you, this sounds promising. I'll be able to run tests later today.

Cheers,

Mark

0 Kudos

4,548 Views
mark_callaghan
Contributor III

Thanks for the patch. It applied OK. But the packet loss behaviour remains.

0 Kudos

4,548 Views
mark_callaghan
Contributor III

I'll try LSDK_2004

0 Kudos

4,551 Views
mark_callaghan
Contributor III

flexbuilder fails to build DPDK in LSDK2004 (I have raised a support case).

Are there any other patches or known issues that might be applicable to my original packet loss problem? Should I raise a support case?

0 Kudos

4,551 Views
yipingwang
NXP TechSupport
NXP TechSupport

I need to discuss this problem with AE team.

0 Kudos

4,551 Views
mark_callaghan
Contributor III

Same problem with LSDK2004 - exactly as per my original post. Should I raise a support issue?

0 Kudos

4,551 Views
yipingwang
NXP TechSupport
NXP TechSupport

I want clarify the issue that seems to happen here:

 

You use a PC that has pktgen running; the pktgen will generate a number of packets-be it N that are injected into a ls1046 port controlled by your DPDK application.

You observed, based on the counters from  rte_eth_rx_burst , that on the Rx side of DPDK application has received aprox 10% less than number N sent from PC, be it M. Your application will fwd the rte_eth_rx_burst packets, the number of transmitted packets from the application is equal with the number of rte_eth_rx_burst packets which is M.

Is this true?

 

As a general rule the simplified flow for a packet in ls1046:

 

--->MEMAC--->FMAN RX port--->Frame queue--->cpu

 

 

You could  try from a separate console:

-check if the frames are reaching the FMAN RX port. This can be achieved by running this command:

find /sys/devices/  -name 'port_frame' -exec cat {} \;

        FM Port not configured...

        fm0-port-rx2 counter: 0

        fm0-port-tx3 counter: 0

        FM Port not configured...

        fm0-port-rx6 counter: 0

        fm0-port-tx7 counter: 0

        fm0-port-rx4 counter: 4205723

        fm0-port-tx5 counter: 0

        FM Port not configured...

        FM Port not configured...

       FM Port not configured...

        FM Port not configured...

        fm0-port-tx2 counter: 0

        fm0-port-tx6 counter: 0

        fm0-port-rx3 counter: 0

        fm0-port-tx4 counter: 647

        fm0-port-oh1 counter: 0

        fm0-port-rx7 counter: 0

        fm0-port-rx5 counter: 0

        FM Port not configured...

        FM Port not configured...

        FM Port not configured...

 

-check if the frames reach the MAC.

Example:

-suppose that the mac in discussion is : e8000 -you can take the offset either from RM or from dts.

-compute absolute address : e8000 + 1a00000 = 1ae8000.

-read memac counters from 0x120 -0x124

 

./iomem r32:4 0x1ae8120

0 Kudos

4,551 Views
mark_callaghan
Contributor III

Hi Yiping Wang,

You use a PC that has pktgen running; the pktgen will generate a number of packets-be it N that are injected into a ls1046 port controlled by your DPDK application.

You observed, based on the counters from  rte_eth_rx_burst , that on the Rx side of DPDK application has received aprox 10% less than number N sent from PC, be it M. Your application will fwd the rte_eth_rx_burst packets, the number of transmitted packets from the application is equal with the number of rte_eth_rx_burst packets which is M.

Is this true?


Yes. The app is forwarding all (M) packets received from rte_eth_rx_burst.
The DPAA stats read with rte_eth_stats_get() show N packets received.



I have fm1-mac1 for Linux use.
My dpdk ports in use are
port 0 fm1-mac5  0xe8000   ... :2d:8f
port 1 fm1-mac6 0xea000    ... :2d:90

fman counts before the run:

# find /sys/devices/  -name 'port_frame' -exec cat {} \;
        fm0-port-rx5 counter: 0
        fm0-port-tx4 counter: 0
        FM Port not configured...
        FM Port not configured...
        fm0-port-tx0 counter: 1365
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
        fm0-port-tx7 counter: 0
        fm0-port-rx4 counter: 47279872
        FM Port not configured...
        fm0-port-rx0 counter: 6800
        fm0-port-tx5 counter: 43045909
        fm0-port-oh1 counter: 0
        FM Port not configured...
        fm0-port-rx7 counter: 0
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
root@localhost:~#

pktgen (on PC) sends 5,398,464 pkts, receives 4,914,070 pkts

from my DPDK app:
total packets sent 4,914,412
I don't currently have a cumulative packet count for the DPAA
stats or packet receive. I do have a derived "packets per second"
which shows receive from  rte_eth_rx_burst at around 10% less than
the DPAA receive rate.

fman counts after the run:

~# find /sys/devices/  -name 'port_frame' -exec cat {} \;
        fm0-port-rx5 counter: 0
        fm0-port-tx4 counter: 0
        FM Port not configured...
        FM Port not configured...
        fm0-port-tx0 counter: 2218
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
        fm0-port-tx7 counter: 0
        fm0-port-rx4 counter: 52678336
        FM Port not configured...
        fm0-port-rx0 counter: 7651
        fm0-port-tx5 counter: 47960321
        fm0-port-oh1 counter: 0
        FM Port not configured...
        fm0-port-rx7 counter: 0
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
        FM Port not configured...
root@localhost:~#

fman count differences:
fm0-port-rx4  5,398,464
fm0-port-tx5  4,914,412

So it seems the frames are being counted by the DPAA stats

that are made available by the DPDK function rte_eth_stats_get(),
but are not reaching the FMAN Rx Port?

Edit (MarkC): I just realised that the fman count _does_ match the DPAA stats.

But the frames are not reaching  rte_eth_rx_burst() in DPDK/userspace

(Can you give me some more pointers on how to read
memory for the memac counters? Can this be done from
the command line? I have tried the app "devmem2" but
only get zeroes:

root@localhost:~# devmem2 0x1ae8120 w
/dev/mem opened.
Memory mapped at address 0xffff97cbc000.
Value at address 0x1AE8120 (0xffff97cbc120): 0x0
root@localhost:~# devmem2 0x1aa8120 w
/dev/mem opened.
Memory mapped at address 0xffff960b5000.
Value at address 0x1AA8120 (0xffff960b5120): 0x0
root@localhost:~#

Cheers,
Mark

0 Kudos

4,551 Views
yipingwang
NXP TechSupport
NXP TechSupport

As a general comment you should rely on FMAN counters (MAC and FMAN RX port then FMAN TX port) then you should check if your app has the exact num of frames as the ones received on FMAN.

And if you do not have a counter in your app then you should debug based on the below steps.

The main point is to see if what_was_sent_from_pc == FMAN RX and debug if this is not true.

 

The counter that shows the FMAN RX it's this one:

fm0-port-rx4 counter: 52678336

 

To see the mac counters do as follows:

 

find  / -name "mac_rx_stats"

 

Rerun with traffic(try to send from the PC for instance 100000 frames -and then you check the other side -for simplicity not to use various numbers with various decimals) and check the rx4 and the mac4; they should be equa. (Number of frames that reached in the mac4 must be equal with fm0-port-rx4)

 

 

A complete packet flow in hardware is(in your case PCD is does not exist; ignore it):

Rx MAC -> Port BMI -> PCD -> BMI ->QMI

 

if the number of frames in mac4 != fm0-port-rx4 then you must check port BMI port counters:

 

-in the dts check the base address of fman rx port that corresponds to mac 0xe8000:

 

it must be :

 port@8c000 {

                                cell-index = <0xc>;

                                compatible = "fsl,fman-v3-port-rx", "fsl,fman-port-1g-rx";

                                reg = <0x8c000 0x1000>;

                                phandle = <0x2e>;

                        };

 

 

-in sysfs go to:

/sys/bus/platform/devices/soc/1a00000.fman/1a8c000.port/statistics/port_rx_out_of_buffers_discard

 

-check this counter. For example if the buffer pool of the port is depleted then frames will not reach the fm0-port-rx4 and port_rx_out_of_buffers_discard is incremented ; in this case mac4 != fm0-port-rx4

 

 

If everything is ok in the BMI it means that the frames should be enqueued by FMAN in the frame queue RX.(this goes through QMI) You can check QMI /sys/bus/platform/devices/soc/1a00000.fman/1a8c000.port/fm_port_qmi_regs

and here you check fmqm_pnetfc (Counts the total number of the enqueue operation which the QMI performed for a specific port)

 

 

 

If everything is ok after the above steps ...that means mac4 == fm0-port-rx4 then you should check if

 

mac4 == fm0-port-rx4 == what_was_sent_from_pc; if not, that means (mac4 == fm0-port-rx4) != what_was_sent_from_pc which suggests that either the pc is not reporting correctly or the pc reports correctly but  some frames are discarded possible by mac4.

 

 

 

 If everything is ok after the above step then you should check what happens

at app level because  the number of frames seen on rx4 should be the num of frames that reach cpu.

 

I will stop at this point; to summarize, we need to see where is the issue in this sequence:

 

Rx MAC -> Port BMI -> BMI ->QMI ->CPU

 

Note: the only counters you should rely on are the HW counters and the ones retrieved by your application.

I do not want to rely on DPDK dpaa stats in the debug session.

0 Kudos

4,551 Views
mark_callaghan
Contributor III

Hi Yiping Wang,

I had previously edited my post May13 10:32pm. I initially

misinterpreted the counts. In fact, fm0-port-rx4 has the same count as

the pktgen traffic generator. It's just that the frames are not being

returned/counted by rte_eth_rx_burst()

i.e. for pktgen tx -> wire -> Rx MAC -> FMAN Rx -> Frame Queue -> cpu

(rte_eth_rx_burst())

pktgen tx == fm0-port-rx4  != cpu (rte_eth_rx_burst())

So I presume I don't need to look at the mac counter? Addressing your

suggestions in your post:

my port 0 is fm1-mac5  0xe8000, which has its base address in

qoriq-fman3-0-1g-4.dtsi

I have attached this file as in my build, unchanged from LSDK2004. The

address matches your post, but there is no

    phandle = <0x2e>;

entry in the structure.

I have attached some files with captured counts from another test run.

I think

cat

/sys/devices/platform/soc/soc:fsl,dpaa/soc:fsl,dpaa:ethernet@0/net/fm1-mac1/mac_rx_stats

is giving me stats for the port used by linux.

I couldn't find stats for my userspace receiver port. There is no "net"

directory under

/sys/devices/platform/soc/soc:fsl,dpaa/soc:fsl,dpaa:ethernet@4

But perhaps there is enough info in the attached files:

pre-run_DPAA_count_may15.txt        the various counters you suggested,

before the test run

pktgen_summary_may15.txt           captured output from pktgen after the run

hyprfire_may15.txt                         printed stats from my app.

Line 4 is the interesting one.

post-run_DPAA_count_may15.txt      the various counters you suggested,

after the test run

( Note: 1,000,000 = 0xf4240,      911,886 = 0xdea0e)

In summary:

pktgen sends 1,000,000 frames

fm0-port-rx4 counter: 1000000

port_rx_out_of_buffers_discard: 0

from fm_port_qmi_regs:

0xFFFFC38C2AD4D420: 0x000dea0e          fmqm_pnetfc

(0xdea0e = 911,886)

my app (hyprfire) receives 911,886 frames from rte_eth_rx_burst()

the fmqm_pnetfc is interesting.

Cheers,

Mark

0 Kudos

4,551 Views
yipingwang
NXP TechSupport
NXP TechSupport

Keeping the same number of frames 1,000,000 can you check :

cat /sys/bus/platform/devices/soc/1a00000.fman/1a8c000.port/statistics/*

 

Let's see what are the values for each statistics; these statistics contain the qmi and bmi regs.

We need to trace where are the missing frames.

The statistics contain:

 

port_dealloc_buf                port_enq_total                  port_rx_bad_frame               port_rx_large_frame

port_discard_frame              port_frame                      port_rx_filter_frame            port_rx_out_of_buffers_discard

 

 

And the mapping in HW:

BMI:

 

rfrc (Number of frames received on the Rx port) rbfc (Bad Frames Counter - Error cause could be bad CRC, MAC FIFO overflow, coding error, etc) rfdc (Frames Discard Counter - were not able to enter the receive queue system due to WRED algorithm. Other reasons for enqueue reject may be tail drop, out of service FQ, etc) rodc (Out of Buffers Discard Counter - Number of received frames that were discarded due to lack of external buffers.) rfldec (Frames List DMA Error Counter- Number of received frames that were discarded due to WRED algorithm, and not able to release their buffers due to DMA error on the scatter/gather list read.) RFFC -number of frames received on the Rx port that were filtered out by the parse and classify modules of the FMan.

rfdc -port discard frame

RBDCRx Buffers Deallocate Counter

 

QMI:

pnetfc

 

These QMI /BMI are mapped with statistics and found in bmi/qmi_port_regs.

I think it's enough to dump the statistics.

0 Kudos

4,551 Views
mark_callaghan
Contributor III

Hi Yiping Wang,

Thank you for your suggestions re stats to look at.  I think I'm making

real progress. The missing packets are counted with RFFC - number of

rames filtered out by parse & classify in FMan. My post run stats

attached. 1000000 frames received by MAC,  88114 filtered by Fman,

911886 received in my code from rte_eth_rx_burst(). The numbers add up

=> happiness! (at least partial)

I had set the port to promiscuous, using "rte_eth_promiscuous_enable()"

Can you suggest how I might find the reason for the filtering?

Can you point me at other settings I need to make to force promiscuous

receive? Is there something available in the DPDK API?

Cheers,

Mark

0 Kudos

4,551 Views
yipingwang
NXP TechSupport
NXP TechSupport

The filter counter is not related to promisc. From the reference manual:

 

The FMBM_RFFC register, counts the number of frames received on the Rx port that were filtered out by the parse and classify modules of the FMan.

Those frames are discarded and not shown to receive queues, unless FMBM_RCFG[FDOVR] is set, in which case the frames are enqueued on the queue configured in FMBM_REFQID[EFQID].

 

 

There are multple situations that could trigger this counter and the situations are specified by the FMBM_RFSDM:

Rx Frame Status Discard Mask Register

For any bit set to 1 in the frame status word, if the corresponding bit in FMBM_RFSDM is set, the frame is discarded. In addition, FMBM_RFFC counter is incremented. Note that if FMBM_RCFG[FDOVR] is set and the corresponding bit in the FMBM_RFSEM is set, the frame is enqueued to EFQID. If any of the events described above does not occur, the frame can continue processing and thus can be enqueued to FQID as selected by the classification process

 

 

Can you identify those frames that are filtered to be able to send only them?

(It seems that the filtering is applied for some particular frames that generate errors when they traverse the PCD) Also - do you know if any PCD is applied? for instance any hashing or classification?

 

 

You can check if for instance D from PCD is applied (D means distribution which from |FMAN HW perspective this is KeyGen -key generation used for hashing and classification):

 

 

cat /sys/devices/platform/soc/1a00000.fman/fm_kg_regs

 

-if the value starts with 8 that means that D is applied.

0 Kudos

4,551 Views
mark_callaghan
Contributor III

My initialisation code is based on that for the DPDK example app l2fwd. I don't think it attempts to set up any parse/classify/distribute configuration. l2fwd exhibits the same frame loss as my app. I will look more closely into port initialisation. From my fm_kg_regs below, I note that KeyGen is enabled:

FmPcdKgRegs Regs (0xFFFFA8CFFCFC1000)
----------------------------------------

0xFFFFA8CFFCFC1000: 0x80000028          fmkg_gcr
0xFFFFA8CFFCFC100C: 0x00000000          fmkg_eer
0xFFFFA8CFFCFC1010: 0xc0000000          fmkg_eeer
0xFFFFA8CFFCFC101C: 0x00000000          fmkg_seer
0xFFFFA8CFFCFC1020: 0x00000000          fmkg_seeer
0xFFFFA8CFFCFC1024: 0x00000000          fmkg_gsr
0xFFFFA8CFFCFC1028: 0x01d6cbee          fmkg_tpc
0xFFFFA8CFFCFC102C: 0x00000000          fmkg_serc
0xFFFFA8CFFCFC1040: 0x00000000          fmkg_fdor
0xFFFFA8CFFCFC1044: 0x00000000          fmkg_gdv0r
0xFFFFA8CFFCFC1048: 0x00000000          fmkg_gdv1r
0xFFFFA8CFFCFC1064: 0x00000000          fmkg_feer
0xFFFFA8CFFCFC11FC: 0x02008011          fmkg_ar
-------

And from the RM:

When EN = 1, the KeyGen is active. It gets jobs, processes packets, and generates results. When EN is
cleared, the KeyGen performs a graceful disable sequence. It stops accepting new packets for processing
the bus and runs existing packets to completion. During that sequence, the BSY bit can still be asserted.
When there are no more packets in progress, the BSY bit is cleared.

If Keygen is not enabled, will frames still flow?

I don't yet know where in DPDK that the FM is configured.

I'll start looking, but any pointers you could give me would be most appreciated. My intent is that all frames should be received. My hope was that any errored frames would be counted by the DPDK stats or xstats.

My generated traffic is a captured packet stream. I don't have any easy way of identifying the problem frames.

Cheers,

Mark

0 Kudos

4,556 Views
mark_callaghan
Contributor III

I'll check my generated traffic source (a pcap file) with wireshark

0 Kudos

4,556 Views
mark_callaghan
Contributor III

My pcap file (captured by a colleague) had a capture limit of 64byte!!!! Based on a count of filtered vs received packets, it seems the missing packets are those that were truncated in the original capture, so that the frame length indicated in the header does not match the actual number of bytes on the wire. I would have hoped there would have been a stats counter in xstats that would show these?

The stat name list returned by rte_eth_xstats_names() is:

rx_good_packets,  tx_good_packets,  rx_good_bytes,  tx_good_bytes,  rx_missed_errors,  rx_errors,  tx_errors,  rx_mbuf_allocation_errors,  rx_q0packets,  rx_q0bytes,  rx_q0errors,  tx_q0packets,  tx_q0bytes,  rx_align_err,  rx_valid_pause,  rx_fcs_err,  rx_vlan_frame,  rx_frame_err,  rx_drop_err,  rx_undersized,  rx_oversize_err,  rx_fragment_pkt,  tx_valid_pause,  tx_fcs_err,  tx_vlan_frame,  rx_undersized

My xstats counts are

xstats:  1000000       0 64367208       0       0       0       0       0       0       0       0       0       0       0       0       0   15418       0       0       0       0       0       0       0       0       0

which only show counts for rx good packets & bytes, and rx_vlan_frame.

Sad, because these stats would have helped me greatly.

0 Kudos

4,556 Views
mark_callaghan
Contributor III

I have replayed an alternative captured stream, and again get a large number of filtered packets. Is there a (DPDK accessible?) way of disabling any such filtering?

0 Kudos

4,556 Views
mark_callaghan
Contributor III

I will make some changes to the rfsdm register to inhibit filtering on various errors. I note this register is written with a hard-coded value in dpdk/drivers/bus/dpaa/base/fman/fman_hw.c

0 Kudos

4,551 Views
hemantagrawal
NXP Employee
NXP Employee

Is your problem solved or you are still facing issues? 

Currently DPDK based DPAA pmd driver by default programs the FMAN to drop all the error packets. However it can be changed and you get get packets in userspace by initializing the errors queues.  Look for RTE_LIBRTE_DPAA_DEBUG_DRIVER flag. Currently the code is only for initializing the error queues but not for polling from them. 

0 Kudos

4,551 Views
mark_callaghan
Contributor III

My immediate problem is basically solved - I modified the rfsdm register so that errored packets were not discarded, but delivered on the main queue.

But I would ideally like to see (sometime in the future? next release?) a count of the discarded errored packets in the DPDK accessible extended stats. This would have saved me a lot of time.

Cheers,

Mark

0 Kudos