MX6 PCIe Full Duplex Bandwidth Capability

rkeefe01 · ‎11-16-2018

Hi,

I have an FPGA endpoint connected to the MX6 as Gen1 x1. The logic in the FPGA is the DMA master doing DMA reads and writes through the MX6 root complex to host memory. Independently, each DMA channel is able to get respectable bandwidth of at least 1.5G bits per second on average. When I interleave reads and writes, however, the read bandwidth is severely reduced to about 700M bits per second.

I have learned that the MX6's PCIe device control register (PCIE_RC_DConR) settings can affect the flow control update rate. For example, the changing the Max_Payload_Size from 128 to 256 resulted in an improvement in the sustained bandwidth of the DMA write operations by increasing the FC credit update rate. Could a similar setting somewhere improve the simultaneous read / write capability?

Yuri · ‎11-21-2018

Hello,

Look at the following:

https://community.nxp.com/thread/312322

Have a great day,

Yuri

------------------------------------------------------------------------------

Note: If this post answers your question, please click the Correct Answer

button. Thank you!

rkeefe01 · ‎11-26-2018

Hi Yuri.

Thanks for the timely response. I understand from the link you sent that the HW limit for MAX_PAYLOAD_SIZE (MPS) is 128 bytes, and that exceeding this results in AXI decomposition which has an adverse affect on PCIe bandwidth. The MX6 User Guide describes this adverse affect in 48.4.4.1.1.1 Decomposition Side-Effects: "Decomposition degrades the PCIe link performance as because it increases the amount of TLP header overhead and uses up extra PCIe TAGS. Decomposition uses up extra header FIFO locations which will reduce the bridges bus offloading ability in some cases."

Is there any way to characterize this degradation? For instance, the MX6 User Guide, section 48.6.2 Effective Throughput, indicates that increasing MPS in general improves bandwidth for the same stated reason that decomposition degrades performance; TLP overhead. That is, if decomposition degrades performance by simply increasing the TLP header overhead, then it seems that it simply reverses any improvement gained by increasing MPS. Is this the extent of the performance limit resulting from decomposition?

Interestingly in my situation, setting the MPS in the Device Control Register to 256 seems to have improved the throughput of my FPGA-based single-channel DMA master's MWr requests. Prior to increasing MPS, the flow control credit (FC) update rate was relatively slow. After the update to 256 of MPS (as confirmed via lspci), the FC update increased and the DMA MWr bandwidth improved sufficiently for my application.

If MPS of 256 initiates decomposition, which should have an adverse effect on bandwidth in general, can you help me understand why my MPS increase seemingly resulted in DMA MWr bandwidth improvement?

Regards,
Rich

MX6 PCIe Full Duplex Bandwidth Capability

MX6 PCIe Full Duplex Bandwidth Capability

i.MX6_All