Encode frames with vpu directly from framebuffer

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
已解决

Encode frames with vpu directly from framebuffer

跳至解决方案
2,649 次查看
alexandershashk
Contributor II

Hello,


I have a task to broadcast an x11 desktop as h264 stream, but desktop capturing is terribly slow on fullhd utilizing only x11 api. As a workaround I want to point vpu directly on framebuffer:

  1. get fb_info with ioctl()
  2. mmap() fb
  3. pass fb pointer to vpu.

Will this work and I'll have best possible performance or I'll get problems?

PS: I'm using custom wandboard implementation with ubuntu 13.10, my xorg.conf is pretty usual: vivante driver and /dev/fb0 as framebuffer. GLX and EGL are functional, so graphic stack seems ok. Or if you have any ideas how x11 capturing can be improved - please share them.

Thanks

标签 (4)
标记 (5)
1 解答
1,997 次查看
EricNelson
Senior Contributor II

Hi Alexander,

This could work for you except for one thing... The VPU encoder(s) mostly want an array of frame-buffers, and the X11 drivers don't currently support even double-buffering.

The encoders generally need at least two, but often three or more frames to build "B"etween-frames inside a GOP (Group of Pictures).

Even if they did support this (as Wayland does), coordinating ownership of the buffers (releasing them back to the display when the owner is done with them) would be tricky at best.

You might consider a simpler form of DMA'ing from the frame-buffer into your next encode buffer at each vertical sync

(or perhaps every other to get 30fps from a 60fps display). If used properly, either the IPU or GPU can do the DMA transfer for you.

在原帖中查看解决方案

0 项奖励
回复
3 回复数
1,998 次查看
EricNelson
Senior Contributor II

Hi Alexander,

This could work for you except for one thing... The VPU encoder(s) mostly want an array of frame-buffers, and the X11 drivers don't currently support even double-buffering.

The encoders generally need at least two, but often three or more frames to build "B"etween-frames inside a GOP (Group of Pictures).

Even if they did support this (as Wayland does), coordinating ownership of the buffers (releasing them back to the display when the owner is done with them) would be tricky at best.

You might consider a simpler form of DMA'ing from the frame-buffer into your next encode buffer at each vertical sync

(or perhaps every other to get 30fps from a 60fps display). If used properly, either the IPU or GPU can do the DMA transfer for you.

0 项奖励
回复
1,997 次查看
alexandershashk
Contributor II

Hi Eric,

Thank you for taking time to answer. This is very useful advice. I was not aware that user application should manage extra buffers for encoding GOPs. I supposed this is hidden from user application.

Question about DMA. I clearly understand that DMA frees cpu from moving bytes from one place to another, so cpu can do something other while dma is working, but is dma faster than memcpy on imx6? If there only benefit that dma just frees cpu and actually requires same time to finish as memcpy, then I'd prefer simpler memcpy for my case.

0 项奖励
回复
1,997 次查看
EricNelson
Senior Contributor II

Hi Alexander,

Yes. DMA will be faster for a number of reasons. CPU instruction overhead is the first, but the memory cache is equally important.

Video buffers (frame-buffer) are generally configured as non-cacheable to the CPU so a write to a pixel will occur immediately. As a consequence, memcpy will cause individual memory accesses to hit DDR. DMA operations will perform burst transfers, which saves some overhead across the CPU->DDR link.