i.MX8QM) Streaming dma on CMA area

hcyang1012 · ‎02-27-2022

Hello,

I am writing a device driver for our co-processor, which communicates with i.MX8QM processor through PCIe bus.

At first, I implemented streaming DMA operations to write/read user processes' data to/from device. That is, user-allocated buffer(via malloc()) is pinned in kernel space for DMA.

However, I changed this DMA operation logic for user process to mmap block(about 512MB) of memory from CMA because user-allocated buffer MUST be copied to bounce buffer because i.MXQM only access to 32 Bit memory area, resulting in serious copy overhead. To be specific:

1. User process mmaps 512MB buffer from DMABUF-heap(/dev/dma_heap/)

2. A custom memory allocator provides API which manages allocation like malloc/free for the buffer.

3. Using these API, the user process gets any size(from several bytes to MB) of buffer, and requests DMA for via our device driver.

4. The device driver does DMA.

Note that the user process gets large(512MB) size of coherent buffer from DMABUF-heap and from that buffer, it allocates small size of buffer from that buffer again to freely use it as they want. But it results in significant performance degradation because the CPU can't use cache on the coherent buffer.

The really what I want is allowing DMA on user-allocated(via malloc) buffer without any unnecessary performance overhead like bounce-buffery copying or disabling cache by coherent memory.

I think the only is to mmap from DMABUF-heap in very-fine-grained manner. that is, instead of using custom allocator for large mmaped-block from CMA, to invoke mmap() whenever a user want to call malloc() and invoke munmap whenever the user want to call free() for that buffer. But I seriously afraid of system call overhead.

Is there any idea?

Thanks in advance.

Best regards

Heecheol Yang.

i.MX8QM) Streaming dma on CMA area

i.MX8QM) Streaming dma on CMA area

i.MX 8 Family | i.MX 8QuadMax (8QM) | 8QuadPlus

Linux