Hello experts,
On my iMX6DL with Linux 3.14.28 I have a camera connected using MIPI. The video streams needs to be processed by the CPU, but the buffers are non-cacheable (Using V4L2_MEMORY_MMAP).
Is there a solution for this issue?
Some ideas -
1. Can I replace the dma_alloc_coherent() uses to allocate buffer with another call to make them cacheable?
2. Is there an example of how to get V4L2_MEMORY_USERPTR working properly? (I couldn't...)
Maybe a unit-test used to verify the Driver implementation?
Any help would be appreciated.
Sincerely,
Erez
Please try to fix ENGR00234387 to support V4L2_MEMORY_USERPTR.
https://bitbucket.org/devonit/linux-2.6-imx/branch/imx_3.0.35_1.1.0
As for an example :
http://linuxtv.org/downloads/v4l-dvb-apis/capture-example.html
Have a great day,
Yuri
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi Yuri,
Thanks for the reply.
The link you sent is from 2012, and it looks like this code is already implemented in 3.14.28.
Anyway -- it doesn't seem to work.
When using USERPTR, the user allocates buffers in user-space, and passes a pointer via m.userptr member of v4l2_buffer. However, none of the mxc-capture code in drivers/media/platform/mxc/capture/ accesses this struct member.
Can you please check?
Regards,
Erez
Can you try the patch
diff --git a/drivers/media/video/mxc/capture/mxc_v4l2_capture.c b/drivers/media/video/mxc/capture/mxc_v4l2_capture.c
index 9130388..dddf670 100644
--- a/drivers/media/video/mxc/capture/mxc_v4l2_capture.c
+++ b/drivers/media/video/mxc/capture/mxc_v4l2_capture.c
@@ -324,8 +324,10 @@ static int mxc_v4l2_prepare_bufs(cam_data *cam, struct v4l2_buffer *buf)
{
pr_debug("In MVC:mxc_v4l2_prepare_bufs\n");
if (buf->index < 0 || buf->index >= FRAME_NUM || buf->length <
- PAGE_ALIGN(cam->v2f.fmt.pix.sizeimage)) {
+ cam->v2f.fmt.pix.sizeimage) {
pr_err("ERROR: v4l2 capture: mxc_v4l2_prepare_bufs buffers "
"not allocated,index=%d, length=%d\n", buf->index,
buf->length);
Hi Yuri,
Removing the 'PAGE_ALIGN' macro?
Is that the whole patch? I don't see how that would make a difference.
With USERPTR the user-space application provide the virtual address of the video buffer in buf.m.userptr. This member is ignored in the function -- so how can it work?
From what I know, for userptr the v4l2 driver should copy the virtual address, and then verify the physical pages are continuous (If not, the driver should rearrange the pages). I think this is done in videobuf2-memops.c this is done in vb2_get_contig_userptr()
Regards,
Erez
From app team :
"
It need user space allocate physical continuous memory , such use ipu alloc , or other method.
I attached v4l2 unit test code , mxc_v4l2_output.c and mx6s_v4l2_capture.c , one use ipu alloc , the other use pxp alloc .
Search memalloc in these code.
"
Regards,
Yuri.
Hi Yuri,
Thanks for the examples. This is good information. However, it does not resolve the problem.
I looked at the kernel sources to see how the IPU and PXP drivers implement the memory allocation, they still use dma_alloc_coherent() ... so, the memory buffers are non cachable.
Moreover, the purpose of USERPTR in V4L2 is to allow users to pass buffers allocated in user-space directly by malloc or statically. Using PXP or IPU ot perform the allocation is not really the way it was intended.
Also -- I'm not sure my problem is clear. I don't have to use USERPTR (MMAP is okay), but using dma_alloc_coherent() give very bad performance.
What I need is a way to allocate the video buffers and to have good performance in user-space
I wrote a DDR benchmark application to show the issue. The code allocates a buffer of 4MB and reads it several times in a loop.
When the buffer is allocated with malloc() it takes 49.8msec to read 10MB.
However, when using IPU_ALLOC, it takes 338.4msec (x6.8 longer!)
root@imx6qsabresd:~# ./ddr_benchmark
Test start
Test complete (dummy 0)
Time taken (nanoseconds): 49827333
root@imx6qsabresd:~# ./ddr_benchmark_ipu
USRP: alloc bufs offset 0x24b00000 size 4149248
Test start
Test complete (dummy 0)
Time taken (nanoseconds): 338425669
Buffer allocated using malloc()
-------------------------------
MMDC new Profiling results:
***********************
Measure time: 1000ms
Total cycles count: 396050646
Busy cycles count: 240362999
Read accesses count: 7001555
Write accesses count: 9174
Read bytes count: 447536316
Write bytes count: 293466
Avg. Read burst size: 63
Avg. Write burst size: 31
Read: 426.80 MB/s / Write: 0.28 MB/s Total: 427.08 MB/s
Utilization: 11%
Bus Load: 60%
Bytes Access: 63
Buffer allocated using IPU_ALLOC
---------------------------------
MMDC new Profiling results:
***********************
Measure time: 1001ms
Total cycles count: 396043446
Busy cycles count: 256915721
Read accesses count: 8802699
Write accesses count: 14923
Read bytes count: 78915456
Write bytes count: 252306
Avg. Read burst size: 8
Avg. Write burst size: 16
Read: 75.18 MB/s / Write: 0.24 MB/s Total: 75.42 MB/s
Utilization: 1%
Bus Load: 64%
Bytes Access: 8
The benchmark code is attached.
To select IPU_ALLOC, uncomment line 12.
Build command: arm-linux-gnueabihf-gcc -O3 -mcpu=cortex-a9 -mfloat-abi=hard ddr_benchmark.c -o ddr_benchmark
From app team :
"The current driver used is coherent mapping .
For customer's use case which need buffer cache-able for
cpu process the captured data buffer , they need implement
this new feature by their own."
Regards,
Yuri.
Hi Yuri-
I found an interesting discussion about performance in user-space for buffers allocated with kmalloc (kmalloc memory slower than malloc )
That lead me to check the mmap function in mxc_v4l2_capture.c, and there I found a solution -
To get normal performance in user-space I did the following changes -
in mxc_mmap() -- comment out: vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
in mxc_allocate_frame_buf() -- replace allocation with dma_alloc_coherent() to kmalloc()
in mxc_free_frame_buf() -- replace dma_free_coherent() with kfree().
I still use V4L2_MEMORY_MMAP.
With this change I can see normal memory loads:
MMDC new Profiling results:
***********************
Measure time: 1000ms
Total cycles count: 396071850
Busy cycles count: 155735196
Read accesses count: 2254063
Write accesses count: 4653481
Read bytes count: 138436492
Write bytes count: 211011038
Avg. Read burst size: 61
Avg. Write burst size: 45
Read: 132.02 MB/s / Write: 201.24 MB/s Total: 333.26 MB/s
Utilization: 14%
Bus Load: 39%
Bytes Access: 50
Thanks for the support!
Regards,
Erez