We are developing a system with our i.MX8QM based custom board, which transfers large(several MB~GB) data to co-processor through PCIe-DMA.
When we tested DMA performance using our device driver for co-processor, we found that there is a significant performance overhead on dma_map_sg() function, which maps physical address into dma bus address.
Following is how DMA works:
1. User application allocates a buffer.(malloc, std::vector, etc..)
2. Pass the pointer of the buffer via write or read syscall.
3. Syscall handler in our PCI-based custom coprocessor device driver pins page of the buffer into kernel space.
4. Prepare scatter gather list and fill it with page numbers of the pinned page.
5. Call dma_map_sg(), passing the scatter gather list to map the physical address of pinned pages into dma bus address.
6. Run DMA with the filled scatter gather list.
In step 5, we found that the swiotlb newly allocates bounce buffer and copying is occurred, which result in significant performance overhead. According to our analysis, this is due to the limitation which DMA can only access under 32 bit address.
So we want to know :
1. Why can DMA access <= 32 bit address only? Is it the linux kernel's or i.MX8's limitation?
2. It is very hard to allocate a several KB~MB size buffer <= 4GB area by device driver. Is there any way to avoid to using swiotlb-bounce buffer? For example:
- Using H/W IOMMU
- Unlocking(?) 32-bit address limitation.
- Others.
Thanks in advance.
Best regards
Heecheol Yang.
已解决! 转到解答。
@hcyang1012
Hello,
I am afraid, right now it is not possible to use the expanded memory.
Regards,
Yuri.
@hcyang1012
Hello,
The address area, higher 1_0000_0000, which was mentioned as "Used for Mapped IO
external flash devices, extensions space for PCIe" is not supported and generally should
be removed from the Reference Manual(s).
Regards,
Yuri.