Hemant Agrawal

DPDK Performance on Layerscape devices

Blog Post created by Hemant Agrawal Employee on Jun 11, 2019
  • Disable hw_prefetch (u-boot):

setenv hwconfig 'fsl_ddr:bank_intlv=auto;core_prefetch:disable=0xFE'

qixis reset altbank (reset the board - in case using bank 0 run 'qixis reset' only)

 

  • bootargs or othbootargs - add below parameters to bootargs (u-boot).

                  Make sure you see the same in ‘cat /proc/cmdline’ once kernel is booted:

- use 1G hugepages:

default_hugepagesz=1024m hugepagesz=1024m hugepages=6 (or any number)

- isolate cpu's for user space (for the CPUs running DPDK without kernel interference):

isolcpus=1-7

- make sure no rcu stalls and watchdog prints:

nmi_watchdog=0 rcupdate.rcu_cpu_stall_suppress=1

 

  • Run enable performance script (kernel) – this will enable running all DPDK applications at RT priorities.

source /usr/local/dpdk/enable_performance_script.sh

(please make sure that you are not using core 0 in the DPDK coremask/lcores - i.e. the core, which is also running the Linux OS services)

 

  • In case you are also using some of the DPAA2 interfaces with kernel, affine all the DPIO portal interrupts to core 0, so no interrupts interfere with user-space threads (kernel).

cat /proc/interrupts (search for dpio interrupts and their corresponding irq numbers)

cat 0x1 > /proc/irq/<irq number>/smp_affinity (for enabling Core 0 to serve interrupts on DPIO)

Run above command for all the dpio portals

 

  • to achieve higher performance on a single interface, use multiple rx queue with packet distribution enabled across cores.

e.g.  For running testpmd in multiqueue mode:

on running testpmd use CLI option '--rxq=<x>' to create ‘x’ rx queues.

For 2 queues use --rxq=2 parameter. For e.g.

./testpmd -c 0x3 -n 1 -- -i --nb-cores=1 --portmask=0x10 --port-topology=chained --rxq=2

  Note 1: default l2fwd example application does not support multiqueues with packet distribution. 

  Note 2:  In case of multiple queues, use adequate number of flows per port (e.g 1K flows per port) so flows can evenly distribute across cores. 

Outcomes