Is it possible to use buffer from GPU memory on i.MX6 as an input to a hardware video encoder? I would prefer h.264 encoder, but any other will do, too.
My use case is the following. I get a frame from a USB camera, but that image uses Bayer tiling for color information, which looks like image no. 2 at https://en.wikipedia.org/wiki/File:Colorful_spring_garden_Bayer.png . I need to convert it to a suitable format before I can feed it to video encoder (most probably YUV420). This conversion can be easily done using OpenGL ES shaders. Using the shaders, I end up with an OpenGL texture that needs to be transferred to video encoder. I could download the texture pixels to CPU and then again upload it to GPU for encoding, but this drops the framerate significantly. Instead, I would like to tell the video encoder to use the existing texture in GPU memory.
Can I use the pointer from eglQueryImageFSL as an input to video encoder, or will the image still go through the CPU?
Is UseEGLImage() implemented in the OpenMAX components of the video encoders? This is impossible to tell just looking at the headers, and I have not bought an i.MX6 board yet, as I do not know if this crucial feature is supported.
In case directly passing memory from one GPU task to another is not possible, could someone tell me how long does it take to download one full HD RGB image from the GPU and how much CPU resource it will take?
已解决! 转到解答。
Yes, you are right, in this case you would be able to avoid glReadPixels, since the data that would be used by the framebuffer will be avaiable in this new virtual framebuffer, and you can use it for what you want. Unfortunately I got the information that we don´t have this sample available yet.
Hi Kalle,
I've had a similar issue to deliver a rendered result through the videoencoder and stream it via tcp. What works for us also at high framerates using an own plugin for gstreamer (available in almost any linux-environment)
inside this plugin:
(output section)
1. Creating a virtual Framebuffer big enough for panning swap (at least 3 times bigger than the image)
2. Setup EGL on this framebuffer ( eglCreateWindowSurface(eglDisplay, eglConfig, vfb_id, NULL); ) here you only use eglSwapBuffer() instead of glReadPixels().
3. This framebuffer can now be used either: The image Processor if the egl output doesn't fit (normally RGBA or something), via the physical adress given by the vfb-subsystem or directly the video-processor also again through the physical adress.
in both cases you need to handle the output swapping manually (forward tripple buffer panning swap just by adding the (next multiple of 32 for lines) * width * bpp )
4. If you use the image processor you need another buffer set for the transport from ipu to vpu: use hwbuffer to give ipu and gpu (normally to build gstreamer buffer) the physical adresses, this saves a lot of cpu-resources and is a clean hw-supported solution
hint: if the video processor supports an 32 bpp YUV interleaved format (normally no standard) you can use the egl-shader for the color-space conversion instead of using the image processor
hope this help a bit, nevertheless, it's still a lot of work to do (seems endlessly for me)
Hi Patrick,
Thanks for the idea of triple buffering! I have got the video pipeline working, from acquiring the image from camera and demosaicing to encoding in real time. The only problem is that once in a while, a frame is not completely rendered when I start encoding it, as I am using only single buffer. It looks as if triple buffering might be a solution to this problem. I will try it as soon as I return to this part of the project. I have already tried all the standard functions of OpenGL to ensure that rendering is finished, but I still get occasional glitches.
We accomplish this by using a virtual framebuffer driver which OpenGL can render into (same as a standard framebuffer) but we can also get a pointer to the physical memory address to pass onto the VPU.
Andre, I noticed that in another thread with a similar topic, you suggested the use of virtual framebuffer. So if I use OpenGL ES to render the textured plane into a virtual framebuffer and somehow pass the framebuffer's address to video encoder, would that it essence let me skip the glReadPixels() step?
And what about the glTexDirectVIV() function? The sample code in GPU SDK shows how to directly upload a texture (instead of using glTexImage2D). Could this be used to also download the texture faster than glReadPixels, or can I give the pointer from glTexDirectVIV() directly to vpu_enc?
Yes, you are right, in this case you would be able to avoid glReadPixels, since the data that would be used by the framebuffer will be avaiable in this new virtual framebuffer, and you can use it for what you want. Unfortunately I got the information that we don´t have this sample available yet.
Thanks for the information! If the virtual framebuffer driver in the GPU SDK Demo folder works as expected, I should be able to use mmap() on it without any additional code samples from Freescale.
Unfortunately the glTexImage2D function is one way only function, so we use it to write buffer not to read. I´m going to talk with Prabhu and see where I can find the virtual framebuffer code.
Andre Silva wrote:
Unfortunately the glTexImage2D function is one way only function...
Did you mean that glTexDirectVIV() is only one way? Meaning that if I use glTexDirectVIV() to get the pointer of a texture, I can only write to that pointer, not read from it?
Since you're recommending OpenCL could you specify how to handle zero copy into the GPU? Or a method to use a texture from the GL context? Thank you, as far as I know this is currently missing making OpenCL a non-starter.
I am open to all options, including OpenCL. For Bayer demosaicing, I am planning to use the code by Morgan McGuire. He wrote OpenGL shaders, but on the web page he also hosts a port to OpenCL (ported by Ron Woods). Still, I do not know how to get the buffer to VPU directly and quickly. It is reassuring to know that this can be done, thanks!
Yes, I am planning to do only a color conversion from a bayer mosaic into whatever works best with the video encoder (probably YUV420?). Everything else is just nice to have, but not necessary.
Thanks for the 2D GPU API document you provided. I read it hrough, but it remains unclear how I could use it for color conversion. I assume you were referring to section 8.6 "Filter BLT". Because of the brevity of the document, a few problems arise:
1) I see no way of inputting Bayer format image, unless "8-bit color index" somehow can specify it. The problem with Bayer is that colors are taken from neighboring pixels (with weights), which are not necessarily neighbouring memory locations (being on different rows, so probably cannot use gcvSURF_R8G8B8G8), and they are not in different image planes (YUV style).
2) I still do not know how to give the output directly to video encoder. Using gctUINT32_PTR?
3) "The GPU supports BT.601 YUV to RGB color conversion standards." This hints that the built-in color conversion feature might not be very flexible. For example, HD formats use BT.709, not BT.601.
4) I cannot just use blitting functions to regroup pixels, as every pixel in the input image influences the color of several pixels in the output image. If I just regroup each four-byte RGBG texel into a XRGB pixel, I would get a 4 times smaller image than the input, losing lots of luminance information in the process. McGuire's shaders implement Malvar-He-Cutler algorithm, which preserves the resolution of luminance image and only subsamples color information (perfect for video encoder, right?).
5) It would not be very easy to re-implement McGuire's shaders using this API, as the document does not say how to access individual pixels or how to multiply and accumulate them in patterns, or what the cost of these operations would be. For decent demosaicing, each channel value of each output pixel has to be a weighted sum of neighboring pixels in the input image.
In conclusion, although the 2D API is quite a comprehensive list of commonly needed procedures, I can not implement custom functions like demosaicing on it, with the knowledge I have on this processor right now. This is why OpenGL and OpenCL are so useful: they are very flexible and very portable, allowing me to use McGuire's code with very few modifications. I would gladly learn lower level functions to implement demosaicing, if there is good documentation publicly available. Or better yet, it would be nice if I could use the ready-made OpenGL or OpenCL code and feed the result directly to the video encoder. This would have the added benefit that in the future, other i.MX users could easily implement solutions to their own custom needs. Alternatively, if there was a fast enough way to copy the frame to CPU and back again, this might also work. Unfortunately, on most embedded platforms, glTexImage2D() and glReadPixels() are very slow, though I have not tested it on i.MX6 yet.
Hi Kalle,
I am not an Android guy, and I think your work is done under this OS (because you mentioned OpenMax).
This seems like a trivial task for gstreamer under Linux, but not sure if you can switch OS for your end product.
Leo
Thanks for the quick reply, Leonardo,
I am a Linux guy myself and would prefer the solution to use Linux, however Android is not out of the question if it solves my problem. The reason I mentioned OpenMAX is that I know that OpenMAX components can pass around their buffers using the EGLImage buffer header, so that the buffer never leaves the GPU. At least this is what the standard says. In reality, most implementations return OMX_ErrorNotImplemented if I try to call UseEGLImage() on a video encoder component.
I am not aware of a standard way to move frames from one component to another in gstreamer that does not involve copying the buffer over to CPU-s memory and back to GPU again. For example, I do not think one can directly link together a glshader element's output and vpuenc's input pads, unless vpuenc specifically supports this (I do not see it in the code). You need to have gldownload between them, right?
I have been looking around for low power SoC's with video encoding hardware. Efficient colorspace conversion seems to be an unsolved problem everywhere. The video codec is always very closed down (totally understandable) and there is very little an end user can do about it, unless you have a multi-million chip contract. At least Freescale has very good documentation and easily accessible software for developers, unlike Allwinner, Samsung or Qualcomm.
What keeps my hopes up right now is the fact that in the gst-fsl-plugins sources, memcpy is used for moving data between CPU and VPU, which suggests that all of the memory in different parts of the SoC is addressed the same way and it could be possible to give an OpenGL ES texture's address to VPU as input. If this is true, I could easily achieve what I need to do. ARM SoCs from other vendors hide away direct memory access to their GPUs.
What I want to achieve is to run some OpenGL shaders (like this) on a video frame before encoding it with VPU, and it all needs to happen in real time at least 18 FPS for 1280x960 color image without using up 50% of CPU resources (and lots of electricity, I will run it on a battery). If you know a trivial way to do it in gstreamer, or even a non-trivial or hackish way to do it in some other way, I would really appreciate it. i.MX6 might prove to be what I have been looking for.