imx6ull v4l2 slow memcpy for captured memory

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

imx6ull v4l2 slow memcpy for captured memory

4,493 Views
firex
Contributor I

hi.


I'm using EVK with imx6ull with connected mt9j003 sensor. I developed the driver for support mt9j003 and it works together with mx6s_capture driver. used linux kernel is linux-imx-4.9.88. I have to update the kernel because of bugs in Ethernet driver.

Now I can capture the images and save it as file or transfer it over USB to host PC. But I found that my framerate is too low(about 2 FPS but expected 4 FPS). Then I measured the times necessary for every operation. I captured the on the mt9j003 generated test pattern in full resolution 10Mpix. The capturing is as expected was with about 4 FPS possible and the bottleneck was the memcpy. For 10 Mbyte it needs over 250 ms! To access the video buffers I used the mmap way.


Reading of many threads in internet confirm my suspicion the over mmap allocated memory are not cached and the access to that memory is very slow. I didn't found any solution for that problem but proposed workaround to use UESRPTR method. I test this way but it does not work as expected. First I got the errno -22 on VIDIOC_QBUF call. After I replaced the malloc to memalign like:
//        buffers[n_buffers].start = malloc(buffer_size);
        buffers[n_buffers].start = memalign(page_size,buffer_size);
I got  another error -14 bad address and a message "contiguous mapping is too small 4096/10444800". Probably the user allocated memory is fragmented in physical memory and DMA can't work with this type of memory.

Now I don't know what can I do to get the fast memcpy of captured frame fast. Because the 250 ms for 10 Mbyte is 40 Mbyte/sec. The 8051 may be is faster then imx6...

Labels (3)
0 Kudos
8 Replies

3,565 Views
igorpadykov
NXP Employee
NXP Employee

Hi Andrej

one can look at memcpy improvements suggestions on

imx6 running slow 

or try sdma memory copy example

mxc_sdma_memcopy_test.c\module_test - imx-test - i.MX Driver Test Application Software 

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

3,565 Views
firex
Contributor I

hi Igor,

thank you for your answer. the first link may be helpful, but i have a little other problem. my system runs on expected speed. and copy on malloc memroy runs up to 10 times faster then on mmap memory allocated in /dev/video0.

about the second link. this is a some module that tests the m2m transfer. the example driver on write init 4 wbufs and starts dma transfer from wbufs to rbufs. On read the data will be compared. I don't see how it can help me to copy in video4linux captured frames to other user allocated memory with the same performance as copy from user allocated memory to user allocated memory.

may be I don't described my problem good enough? sorry for bad English.

Best Regards
Andrej.

0 Kudos

3,565 Views
igorpadykov
NXP Employee
NXP Employee

Hi Andrej

as your task is specifc for memcpy improvements may be recommended to use

NXP Professional Services|NXP 

Best regards
igor

0 Kudos

3,565 Views
firex
Contributor I

Hi Igor,

thank you for the answer. i don't know what is specific in my question. I want only get the captured image from v4l device make some operations with the image and send it over USB to PC. Send to PC works es expected. Image capturing too, but only the copy the captured v4l frame to the user memory runs 10 times slower as user memory to user memory.
I try to explain on the othe way:

void *buf1,*buf2; // i have two buffers
const size_t size = 10*1024*1024; // both buffers are 10 Mb

case1:

buf1 = malloc(size);
buf2 = malloc(size);
memcpy(buf1, buf2, size); // this operation is approx 20 ms 

case2:

buf1 = malloc(size);
xioctl(fd, VIDIOC_QUERYBUF, &v4l_buf_struct); // this step is necessary for v4l2 to allocate buffer
buf2 = mmap(NULL,size,PROT_READ | PROT_WRITE,MAP_SHARED,fd,v4l_buf_struct.m.offset); // get memory allocated in v4l2
memcpy(buf1,buf2,size); // this operation needs 250 ms!!!

Best regards

Andrej

0 Kudos

1,792 Views
Jack9
Contributor I
Dear firex:
Have you soloved this problem?
0 Kudos

3,565 Views
igorpadykov
NXP Employee
NXP Employee

Hi Andrej

specific are requirements for your task (in particular memcpy implementation),

NXP provides software which does not meet them. Performance requirements, as they

are board specific, are usually supported through NXP Professional Services.

Best regards
igor

0 Kudos

3,565 Views
firex
Contributor I

Hi Igor,

thank you for the fast answer. I can confirm, that is a memcpy operation that making a problems, but only with memory allocated by mx6s_capture module. :-) I guess this module is from freescale/nxp. So I tried to ask by NXP :-)

The module mx6s_capture provides the v4l2 interface. This module returns the memory that have bad performance to copy it to userspace. I can try to implement for-loop that copies the memory in loop over incremented pointers but I guess the result will be the same.

I think the task to copy the captured image to userspace nothing special. ;-)

Best regards

Andrej.

0 Kudos

3,565 Views
Yuri
NXP Employee
NXP Employee

Hello,

  the following may be helpful: https://community.nxp.com/message/536900 

Regards,

Yuri.

0 Kudos