Yes, I am planning to do only a color conversion from a bayer mosaic into whatever works best with the video encoder (probably YUV420?). Everything else is just nice to have, but not necessary.
Thanks for the 2D GPU API document you provided. I read it hrough, but it remains unclear how I could use it for color conversion. I assume you were referring to section 8.6 "Filter BLT". Because of the brevity of the document, a few problems arise:
1) I see no way of inputting Bayer format image, unless "8-bit color index" somehow can specify it. The problem with Bayer is that colors are taken from neighboring pixels (with weights), which are not necessarily neighbouring memory locations (being on different rows, so probably cannot use gcvSURF_R8G8B8G8), and they are not in different image planes (YUV style).
2) I still do not know how to give the output directly to video encoder. Using gctUINT32_PTR?
3) "The GPU supports BT.601 YUV to RGB color conversion standards." This hints that the built-in color conversion feature might not be very flexible. For example, HD formats use BT.709, not BT.601.
4) I cannot just use blitting functions to regroup pixels, as every pixel in the input image influences the color of several pixels in the output image. If I just regroup each four-byte RGBG texel into a XRGB pixel, I would get a 4 times smaller image than the input, losing lots of luminance information in the process. McGuire's shaders implement Malvar-He-Cutler algorithm, which preserves the resolution of luminance image and only subsamples color information (perfect for video encoder, right?).
5) It would not be very easy to re-implement McGuire's shaders using this API, as the document does not say how to access individual pixels or how to multiply and accumulate them in patterns, or what the cost of these operations would be. For decent demosaicing, each channel value of each output pixel has to be a weighted sum of neighboring pixels in the input image.
In conclusion, although the 2D API is quite a comprehensive list of commonly needed procedures, I can not implement custom functions like demosaicing on it, with the knowledge I have on this processor right now. This is why OpenGL and OpenCL are so useful: they are very flexible and very portable, allowing me to use McGuire's code with very few modifications. I would gladly learn lower level functions to implement demosaicing, if there is good documentation publicly available. Or better yet, it would be nice if I could use the ready-made OpenGL or OpenCL code and feed the result directly to the video encoder. This would have the added benefit that in the future, other i.MX users could easily implement solutions to their own custom needs. Alternatively, if there was a fast enough way to copy the frame to CPU and back again, this might also work. Unfortunately, on most embedded platforms, glTexImage2D() and glReadPixels() are very slow, though I have not tested it on i.MX6 yet.