V4L2_MEMORY_USERPTR example for iMX6

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

V4L2_MEMORY_USERPTR example for iMX6

13,835 Views
erezsteinberg
Contributor IV

Hello experts,

On my iMX6DL with Linux 3.14.28 I have a camera connected using MIPI.  The video streams needs to be processed by the CPU, but the buffers are non-cacheable (Using V4L2_MEMORY_MMAP).

Is there a solution for this issue?

Some ideas -

1. Can I replace the dma_alloc_coherent() uses to allocate buffer with another call to make them cacheable?

2. Is there an example of how to get V4L2_MEMORY_USERPTR working properly? (I couldn't...)

   Maybe a unit-test used to verify the Driver implementation?

Any help would be appreciated.

Sincerely,

Erez

Labels (3)
10 Replies

7,801 Views
Yuri
NXP Employee
NXP Employee

  Please try to fix ENGR00234387 to support V4L2_MEMORY_USERPTR.


https://bitbucket.org/devonit/linux-2.6-imx/branch/imx_3.0.35_1.1.0

https://bitbucket.org/devonit/linux-2.6-imx/commits/4eed6a080a4d4a453d976a4fb2dc6f5d90fbb827?at=mast...

As for an example :

http://linuxtv.org/downloads/v4l-dvb-apis/capture-example.html


Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply

7,801 Views
erezsteinberg
Contributor IV

Hi Yuri,

Thanks for the reply.

The link you sent is from 2012, and it looks like this code is already implemented in 3.14.28.

Anyway -- it doesn't seem to work.

When using USERPTR, the user allocates buffers in user-space, and passes a pointer via m.userptr member of v4l2_buffer. However, none of the mxc-capture code in drivers/media/platform/mxc/capture/  accesses this struct member.

Can you please check?

Regards,

Erez

0 Kudos
Reply

7,801 Views
Yuri
NXP Employee
NXP Employee

Can you try the patch

diff --git a/drivers/media/video/mxc/capture/mxc_v4l2_capture.c b/drivers/media/video/mxc/capture/mxc_v4l2_capture.c

index 9130388..dddf670 100644

--- a/drivers/media/video/mxc/capture/mxc_v4l2_capture.c

+++ b/drivers/media/video/mxc/capture/mxc_v4l2_capture.c

@@ -324,8 +324,10 @@ static int mxc_v4l2_prepare_bufs(cam_data *cam, struct v4l2_buffer *buf)

{

  pr_debug("In MVC:mxc_v4l2_prepare_bufs\n");

  if (buf->index < 0 || buf->index >= FRAME_NUM || buf->length <

- PAGE_ALIGN(cam->v2f.fmt.pix.sizeimage)) {

+ cam->v2f.fmt.pix.sizeimage) {

  pr_err("ERROR: v4l2 capture: mxc_v4l2_prepare_bufs buffers "

  "not allocated,index=%d, length=%d\n", buf->index,

  buf->length);

0 Kudos
Reply

7,801 Views
erezsteinberg
Contributor IV

Hi Yuri,

Removing the 'PAGE_ALIGN' macro?

Is that the whole patch? I don't see how that would make a difference.

With USERPTR the user-space application provide the virtual address of the video buffer in buf.m.userptr.   This member is ignored in the function -- so how can it work?

From what I know, for userptr the v4l2 driver should copy the virtual address, and then verify the physical pages are continuous (If not, the driver should rearrange the pages). I think this is done in videobuf2-memops.c this is done in vb2_get_contig_userptr()

Regards,

Erez

0 Kudos
Reply

7,801 Views
Yuri
NXP Employee
NXP Employee

From app team :

"

It need user space allocate  physical continuous memory , such use  ipu alloc , or other method.

I attached v4l2 unit test code , mxc_v4l2_output.c and mx6s_v4l2_capture.c , one use ipu alloc , the other use pxp alloc .

Search memalloc in these code.

"

Regards,

Yuri.

0 Kudos
Reply

7,801 Views
erezsteinberg
Contributor IV

Hi Yuri,

Thanks for the examples. This is good information. However, it does not resolve the problem.

I looked at the kernel sources to see how the IPU and PXP drivers implement the memory allocation, they still use dma_alloc_coherent() ... so, the memory buffers are non cachable.

Moreover, the purpose of USERPTR in V4L2 is to allow users to pass buffers allocated in user-space directly by malloc or statically. Using PXP or IPU ot perform the allocation is not really the way it was intended.

Also -- I'm not sure my problem is clear.  I don't have to use USERPTR (MMAP is okay), but using dma_alloc_coherent() give very bad performance.

What I need is a way to allocate the video buffers and to have good performance in user-space

I wrote a DDR benchmark application to show the issue. The code allocates a buffer of 4MB and reads it several times in a loop.

When the buffer is allocated with malloc() it takes 49.8msec to read 10MB.

However, when using IPU_ALLOC, it takes 338.4msec (x6.8 longer!)

root@imx6qsabresd:~# ./ddr_benchmark

Test start

Test complete (dummy 0)

Time taken (nanoseconds): 49827333

root@imx6qsabresd:~# ./ddr_benchmark_ipu

USRP: alloc bufs offset 0x24b00000 size 4149248

Test start

Test complete (dummy 0)

Time taken (nanoseconds): 338425669

Buffer allocated using malloc()

-------------------------------

MMDC new Profiling results:

***********************

Measure time: 1000ms

Total cycles count: 396050646

Busy cycles count: 240362999

Read accesses count: 7001555

Write accesses count: 9174

Read bytes count: 447536316

Write bytes count: 293466

Avg. Read burst size: 63

Avg. Write burst size: 31

Read: 426.80 MB/s /  Write: 0.28 MB/s  Total: 427.08 MB/s

Utilization: 11%

Bus Load: 60%

Bytes Access: 63

Buffer allocated using IPU_ALLOC

---------------------------------

MMDC new Profiling results:

***********************

Measure time: 1001ms

Total cycles count: 396043446

Busy cycles count: 256915721

Read accesses count: 8802699

Write accesses count: 14923

Read bytes count: 78915456

Write bytes count: 252306

Avg. Read burst size: 8

Avg. Write burst size: 16

Read: 75.18 MB/s /  Write: 0.24 MB/s  Total: 75.42 MB/s

Utilization: 1%

Bus Load: 64%

Bytes Access: 8

The benchmark code is attached.

To select IPU_ALLOC, uncomment line 12.

Build command: arm-linux-gnueabihf-gcc -O3 -mcpu=cortex-a9 -mfloat-abi=hard ddr_benchmark.c -o ddr_benchmark

0 Kudos
Reply

7,800 Views
Yuri
NXP Employee
NXP Employee

From app team :

"The current driver used  is coherent mapping .

For customer's use case which need  buffer cache-able for
cpu process  the captured data buffer , they need  implement
this new feature by their own."

Regards,

Yuri.

0 Kudos
Reply

7,801 Views
erezsteinberg
Contributor IV

Hi Yuri-

I found an interesting discussion about performance in user-space for buffers allocated with kmalloc (kmalloc memory slower than malloc )

That lead me to check the mmap function in mxc_v4l2_capture.c, and there I found a solution -

To get normal performance in user-space I did the following changes -

in mxc_mmap()   -- comment out:   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);

in mxc_allocate_frame_buf()   --  replace allocation with dma_alloc_coherent() to kmalloc()

in mxc_free_frame_buf() -- replace dma_free_coherent() with kfree().

I still use V4L2_MEMORY_MMAP.

With this change I can see normal memory loads:

MMDC new Profiling results:

***********************

Measure time: 1000ms

Total cycles count: 396071850

Busy cycles count: 155735196

Read accesses count: 2254063

Write accesses count: 4653481

Read bytes count: 138436492

Write bytes count: 211011038

Avg. Read burst size: 61

Avg. Write burst size: 45

Read: 132.02 MB/s /  Write: 201.24 MB/s  Total: 333.26 MB/s

Utilization: 14%

Bus Load: 39%

Bytes Access: 50

Thanks for the support!

Regards,

Erez

7,801 Views
kennywang
Contributor III

Hi  Erez

Did you solve the issue? I modify  the the functions as you written, but it also slower. Would you share your your code ?

Thanks ,

Kenny

0 Kudos
Reply

7,801 Views
yeti425
Contributor I

Hi Erez

I'm having a similar issue with memory access times from the user space. Would it be possible for you to share the source code for the three functions please?

Many thanks

Andy

0 Kudos
Reply