Hello,
We are developing an embedded system using i.MX8QM. This custom board uses a co-processor which is connected through PCIe. Also, we are using DMA subsystem to write and read data into PCIe bus. The following block diagram describes the brief data flow diagram.
The issue is that the execution for dma_map_sg()/dma_unmap_sg() takes a considerable time compared to the actual DMA operation(dma_async_issue_pending()) so that the total performance(BW) is poorer than we expect.
We are struggling with this issue with no luck so far. Any suggestion, guide, or checkpoint on this issue? Please don't hesitate if one need more information like source code or concrete HW block.
Thank you in advance!
Hello @Ethan42,
Following is our System engineer's suggestion.
I think the dma_map_sg() performance can be impacted by CMA, for CMA based kernel, when the DMA memory is not used, it can be used by other applications in the system, and when PCIE starts to us DMA memory, the kernel will swap these memory out of CMA, it will cost time.
We have a reference patch to reserve DMA memory from CMA, customer can try it, it is based on 4.14.98 kernel, customer can port it to 5.x kernel.
https://community.nxp.com/t5/i-MX-Processors-Knowledge-Base/How-to-get-rid-of-CMA/ta-p/1123287
Would you have a check and try this suggestion?