Encode frames with vpu directly from framebuffer

alexandershashk · ‎09-04-2014

Hello,

I have a task to broadcast an x11 desktop as h264 stream, but desktop capturing is terribly slow on fullhd utilizing only x11 api. As a workaround I want to point vpu directly on framebuffer:

get fb_info with ioctl()
mmap() fb
pass fb pointer to vpu.

Will this work and I'll have best possible performance or I'll get problems?

PS: I'm using custom wandboard implementation with ubuntu 13.10, my xorg.conf is pretty usual: vivante driver and /dev/fb0 as framebuffer. GLX and EGL are functional, so graphic stack seems ok. Or if you have any ideas how x11 capturing can be improved - please share them.

Thanks

EricNelson · ‎09-04-2014

Hi Alexander,

This could work for you except for one thing... The VPU encoder(s) mostly want an array of frame-buffers, and the X11 drivers don't currently support even double-buffering.

The encoders generally need at least two, but often three or more frames to build "B"etween-frames inside a GOP (Group of Pictures).

Even if they did support this (as Wayland does), coordinating ownership of the buffers (releasing them back to the display when the owner is done with them) would be tricky at best.

You might consider a simpler form of DMA'ing from the frame-buffer into your next encode buffer at each vertical sync

(or perhaps every other to get 30fps from a 60fps display). If used properly, either the IPU or GPU can do the DMA transfer for you.

View solution in original post

EricNelson · ‎09-04-2014

Hi Alexander,

This could work for you except for one thing... The VPU encoder(s) mostly want an array of frame-buffers, and the X11 drivers don't currently support even double-buffering.

The encoders generally need at least two, but often three or more frames to build "B"etween-frames inside a GOP (Group of Pictures).

Even if they did support this (as Wayland does), coordinating ownership of the buffers (releasing them back to the display when the owner is done with them) would be tricky at best.

You might consider a simpler form of DMA'ing from the frame-buffer into your next encode buffer at each vertical sync

(or perhaps every other to get 30fps from a 60fps display). If used properly, either the IPU or GPU can do the DMA transfer for you.

alexandershashk · ‎09-04-2014

Hi Eric,

Thank you for taking time to answer. This is very useful advice. I was not aware that user application should manage extra buffers for encoding GOPs. I supposed this is hidden from user application.

Question about DMA. I clearly understand that DMA frees cpu from moving bytes from one place to another, so cpu can do something other while dma is working, but is dma faster than memcpy on imx6? If there only benefit that dma just frees cpu and actually requires same time to finish as memcpy, then I'd prefer simpler memcpy for my case.

EricNelson · ‎09-07-2014

Hi Alexander,

Yes. DMA will be faster for a number of reasons. CPU instruction overhead is the first, but the memory cache is equally important.

Video buffers (frame-buffer) are generally configured as non-cacheable to the CPU so a write to a pixel will occur immediately. As a consequence, memcpy will cause individual memory accesses to hit DDR. DMA operations will perform burst transfers, which saves some overhead across the CPU->DDR link.

Encode frames with vpu directly from framebuffer

Encode frames with vpu directly from framebuffer

Graphics & Display

i.MX6_All

i.MX6Quad

Linux