Hi Yuri,
Thanks for the examples. This is good information. However, it does not resolve the problem.
I looked at the kernel sources to see how the IPU and PXP drivers implement the memory allocation, they still use dma_alloc_coherent() ... so, the memory buffers are non cachable.
Moreover, the purpose of USERPTR in V4L2 is to allow users to pass buffers allocated in user-space directly by malloc or statically. Using PXP or IPU ot perform the allocation is not really the way it was intended.
Also -- I'm not sure my problem is clear. I don't have to use USERPTR (MMAP is okay), but using dma_alloc_coherent() give very bad performance.
What I need is a way to allocate the video buffers and to have good performance in user-space
I wrote a DDR benchmark application to show the issue. The code allocates a buffer of 4MB and reads it several times in a loop.
When the buffer is allocated with malloc() it takes 49.8msec to read 10MB.
However, when using IPU_ALLOC, it takes 338.4msec (x6.8 longer!)
root@imx6qsabresd:~# ./ddr_benchmark
Test start
Test complete (dummy 0)
Time taken (nanoseconds): 49827333
root@imx6qsabresd:~# ./ddr_benchmark_ipu
USRP: alloc bufs offset 0x24b00000 size 4149248
Test start
Test complete (dummy 0)
Time taken (nanoseconds): 338425669
Buffer allocated using malloc()
-------------------------------
MMDC new Profiling results:
***********************
Measure time: 1000ms
Total cycles count: 396050646
Busy cycles count: 240362999
Read accesses count: 7001555
Write accesses count: 9174
Read bytes count: 447536316
Write bytes count: 293466
Avg. Read burst size: 63
Avg. Write burst size: 31
Read: 426.80 MB/s / Write: 0.28 MB/s Total: 427.08 MB/s
Utilization: 11%
Bus Load: 60%
Bytes Access: 63
Buffer allocated using IPU_ALLOC
---------------------------------
MMDC new Profiling results:
***********************
Measure time: 1001ms
Total cycles count: 396043446
Busy cycles count: 256915721
Read accesses count: 8802699
Write accesses count: 14923
Read bytes count: 78915456
Write bytes count: 252306
Avg. Read burst size: 8
Avg. Write burst size: 16
Read: 75.18 MB/s / Write: 0.24 MB/s Total: 75.42 MB/s
Utilization: 1%
Bus Load: 64%
Bytes Access: 8
The benchmark code is attached.
To select IPU_ALLOC, uncomment line 12.
Build command: arm-linux-gnueabihf-gcc -O3 -mcpu=cortex-a9 -mfloat-abi=hard ddr_benchmark.c -o ddr_benchmark