I'm using the i.MX8MM (custom board) and have a PCIe x1 Gen2 connection to an FPGA. The FPGA acts as an endpoint and currently I'm trying to maximize the throughput of the link. By using the closed-source PCIe IP core from the FPGA vendor I've been able to get the following bandwidths with a DMA:
Read Requests from CPU to device [2^26 bytes]: 165.0387 ms -> 3252.9991 Mbit/s
Posted Write Requests to CPU from device [2^26 bytes]: 328.6304 ms -> 1633.6619 Mbit/s
The FPGA vendor isn't able to reproduce the low throughput for posted write requests, and therefore I'm asking here if the i.MX8MM is known for a "bad" write performance on the PCIe bus. As far as I can tell, the read throughput of 3.25 Gbit/s is nearly ideal (4 Gbit/s * 128b/148b) for 128b requests. Furthermore I have verified that the DMA logic on the FPGA is capable to deliver enough data and is therefore not responsible for this bottleneck. Can you share some benchmark results, or are there any configuration tweaks to optimize for write throughput?