How to insert GPU shading between camera capture and preview window

DraganOstojic · ‎07-17-2013

I already have a preview working from the gray scale camera. I capture each frame as 8-bit generic data and use IPU unit to convert 8-bit gray scale raw data into ARGB8888. To do that I use custom color space conversion matrix and YUV->RGB conversion IPU task. All capture buffers including one from the output of the IPU are allocated from the native window. At this point, I submit output from IPU to the preview window. I'd like to insert some additional GPU processing (custom shading) in between i.e. after IPU and before submitting to preview window. My system is imx6 android ics 13.4.1.04. Can you please give me some ideas how this could be done? If I could look into some code examples that would be great.

DraganOstojic · ‎08-01-2013

After much trial and error I figured out how to integrate OpenGL ES rendering into camera HAL. The reason this is useful is that imx6 has a very powerful Vivante 3D GPU which can be used for all kinds of processing video from camera. For example it can convert raw bayer in the shader. In my case and because imx6 doesn't support 8-bit gray scale to RGBA8888 natively, I use pixel shader to do the conversion. The performance numbers are quite impressive. Processing full 2592x1944 resolution frame takes less than 19ms. Aptina mt9p031 sensor that I'm using can output this resolution at a rate of 14fps with 96 MHz pixel clock so with this GPU speed I can maintain this frame rate and have extra time for other processing that we plan to do on the frame. Vivante driver also has OpenGL extension which allows direct mapping of the capture buffers into the GPU memory space so no copy operation is required before GPU can do its job. The same thing could be done with the IPU (which I actually implemented) however it takes IPU ~150ms so this is much better.

One point to make is that it appears that Android makes it difficult to get a pointer to ANativeWindow without a hack so I render full screen at the moment. Once I figure out how to get that pointer I'll post more detailed code.

If anybody is interested for more details let me know.

在原帖中查看解决方案

clound920419 · ‎02-04-2016

Hi, @Dragan Ostojic, I have been recently working on supporting MT9P006 image sensor on Linux/MX6Q platform. The output data from MT9P006 are 12 bit bayer raw data, and the image size is 640 x 480. To make things easy, we configure csi data width as 16 bit Bayer /Generic data, and the input path is CSI->SMFC->IDMAC->Mem. At last, I got the bayer data and convert it to ARGB use interpolation method, but it is a really slow way. As you have mentioned, we can convert bayer format in the shader use GPU, but I really don't know how to do it, could give me some advice about it? Or share some full example about it, my email is liubin101015@outlook.com.

Thanks

dinhbang · ‎03-25-2014

Hi Dragan, I don't understand step 2:"Preview window is pre-connected to NATIVE_WINDOW_API_CAMERA so it has to be disconnected before step 3" So I may need to add "native_window_api_disconnect(aNativeWindow, NATIVE_WINDOW_API_CAMERA);" to the function? And at step 4, How to map capture buffers to the GPU textures? I don't have experience about GPU, I will really appreciate if you help.

DraganOstojic · ‎08-01-2013

After much trial and error I figured out how to integrate OpenGL ES rendering into camera HAL. The reason this is useful is that imx6 has a very powerful Vivante 3D GPU which can be used for all kinds of processing video from camera. For example it can convert raw bayer in the shader. In my case and because imx6 doesn't support 8-bit gray scale to RGBA8888 natively, I use pixel shader to do the conversion. The performance numbers are quite impressive. Processing full 2592x1944 resolution frame takes less than 19ms. Aptina mt9p031 sensor that I'm using can output this resolution at a rate of 14fps with 96 MHz pixel clock so with this GPU speed I can maintain this frame rate and have extra time for other processing that we plan to do on the frame. Vivante driver also has OpenGL extension which allows direct mapping of the capture buffers into the GPU memory space so no copy operation is required before GPU can do its job. The same thing could be done with the IPU (which I actually implemented) however it takes IPU ~150ms so this is much better.

One point to make is that it appears that Android makes it difficult to get a pointer to ANativeWindow without a hack so I render full screen at the moment. Once I figure out how to get that pointer I'll post more detailed code.

If anybody is interested for more details let me know.

DraganOstojic · ‎08-12-2013

To render camera captured images with GPU to the preview window we need to get ANativeWindow* pointer for the preview window. Camera HAL doesn't provide that pointer and in fact hides it. To get this pointer, a new API has to be added to frameworks/base/services/camera/libcameraservice/CameraHardwareInterface.h as follows:

class CameraHardwareInterface : public virtual RefBase {

public:

...

static ANativeWindow* getANativeWindow(struct preview_stream_ops* w)

{

return __to_anw(((struct camera_preview_window*)w)->user);

}

This pointer will be retrieved from hardware/imx/mx6/libcamera/CameraHal.cpp when we create EGL context. In addition, a following line has to be commented out in hardware/imx/mx6/libcamera/Android.mk to prevent warnings from including CameraHardwareInterface.h causing compilation errors:

#LOCAL_CPPFLAGS += -Werror

Preview window will be used for GPU rendering so it can't be used for allocating buffers for camera capture any more. To allocate capture buffers we need to create another native window from the camera HAL code. This window will never be used for displaying anything so buffers will be dequeued from it and never enqueued.

Note that for capture to work through EGL, all EGL initialization and OpenGL rendering has to be done from a single thread. One simple way to do this is to put EGL code into camera capture thread.

Overall steps are:

1. Create native window and allocate capture buffers. No buffers should be allocated from the preview window so that code in CameraHal.cpp should be skipped.

2. Preview window is pre-connected to NATIVE_WINDOW_API_CAMERA so it has to be disconnected before step 3.

3. Peform standard EGL initialization steps. Create EGL context from the ANativeWindow* pointer to the camera preview window

4. Map capture buffers to the GPU textures

5. Load shader program

6. On each captured buffer render in a standard way

7. Skip code in CameraHal.cpp that submits captured buffers to the preview window and only re-queue buffer to the camera device.

8. Upon camera termination, clean up allocated buffers and EGL context.

I'll follow up with some code fragments for above steps.

1. Create native window and allocate capture buffers

Replace original CameraHal::allocateBuffersFromNativeWindow() with the following code:

sp<SurfaceTexture> mST;

sp<SurfaceTextureClient> mSTC;

sp<ANativeWindow> mANW;

ANativeWindowBuffer* anb[6];

unsigned long imageWidth;

unsigned long imageHeight;

unsigned long imageSize;

imageWidth = 640;

imageHeight = 480;

imageSize = imageWidth * imageHeight;

mST = new SurfaceTexture(123);

mSTC = new SurfaceTextureClient(mST);

mANW = mSTC;

native_window_set_usage(mANW.get(), GRALLOC_USAGE_SW_READ_OFTEN |

GRALLOC_USAGE_SW_WRITE_OFTEN |

GRALLOC_USAGE_FORCE_CONTIGUOUS |

GRALLOC_USAGE_HW_TEXTURE);

// YV12 can be used when capturing from gray scale camera and U/V plane is a placeholder.

native_window_set_buffers_geometry(mANW.get(), imageWidth, imageHeight, HAL_PIXEL_FORMAT_YV12);

native_window_set_buffer_count(mANW.get(), captureBuffersNumber);

GraphicBufferMapper& mapper = GraphicBufferMapper::get();

Rect rect(imageWidth, imageHeight);

void* pVaddr;

for (int i = 0; i < 6; i++)

{

mANW->dequeueBuffer(mANW.get(), &anb[i]);

buffer_handle_t* buf_h = &anb[i]->handle;

private_handle_t* handle = (private_handle_t*)(*buf_h);

mapper.lock(handle, GRALLOC_USAGE_SW_READ_OFTEN | GRALLOC_USAGE_SW_WRITE_OFTEN, rect, &pVaddr);

mCaptureBuffers[i].virt_start = (unsigned char*)handle->base;

mCaptureBuffers[i].phy_offset = handle->phys;

mCaptureBuffers[i].length = handle->size;

mCaptureBuffers[i].native_buf = (void*)buf_h;

mCaptureBuffers[i].refCount = 0;

mCaptureBuffers[i].buf_state = WINDOW_BUFS_DEQUEUED;

// When a buffer is converted into GPU texture, YUV input is color space converted to RGBA8888 in GPU.

// Placeholder U/V values will interfere so they need to be initialized to 128 to cancel out.

void* uvPlane = mCaptureBuffers[i].virt_start + imageSize;

memset(uvPlane, 128, imageSize * 1/2);

}

To clean-up, replace original CameraHal::freeBuffersToNativeWindow() with the following code:

GraphicBufferMapper& mapper = GraphicBufferMapper::get();

for (int i = 0; i < 6; i++)

{

buffer_handle_t* buf_h = (buffer_handle_t*)mCaptureBuffers[i].native_buf;

mapper.unlock(*buf_h);

mANW->cancelBuffer(mANW.get(), anb[i]);

mCaptureBuffers[i].buf_state = WINDOW_BUFS_INVALID;

mCaptureBuffers[i].refCount = 0;

mCaptureBuffers[i].native_buf = NULL;

mCaptureBuffers[i].virt_start = NULL;

mCaptureBuffers[i].length = 0;

mCaptureBuffers[i].phy_offset = 0;

}

mANW.clear();

mSTC.clear();

mST.clear();

2. Preview window is pre-connected to NATIVE_WINDOW_API_CAMERA so it has to be disconnected before step 3:

native_window_api_disconnect(aNativeWindow, NATIVE_WINDOW_API_CAMERA);

Upon camera termination, make sure that you re-connect to camera API, otherwise there will be an error message:

native_window_api_connect(aNativeWindow, NATIVE_WINDOW_API_CAMERA);

3. Peform standard EGL initialization steps. Create EGL context from the ANativeWindow* pointer to the camera preview window:

EGLint numConfigs;

EGLint majorVersion;

EGLint minorVersion;

EGLConfig eglConfig;

EGLContext eglContext;

EGLSurface eglSurface;

EGLDisplay eglDisplay

static const EGLint contextAttribs[] =

{

EGL_CONTEXT_CLIENT_VERSION, 2,

EGL_NONE

};

static const EGLint configAttribs[] =

{

EGL_SAMPLES, 0,

EGL_RED_SIZE, 8,

EGL_GREEN_SIZE, 8,

EGL_BLUE_SIZE, 8,

EGL_ALPHA_SIZE, 8,

EGL_DEPTH_SIZE, 0,

EGL_SURFACE_TYPE, EGL_WINDOW_BIT,

EGL_RENDERABLE_TYPE, EGL_OPENGL_ES2_BIT,

EGL_NONE

};

EGLint w;

EGLint h;

eglBindAPI(EGL_OPENGL_ES_API);

eglDisplay = eglGetDisplay(EGL_DEFAULT_DISPLAY);

eglInitialize(eglDisplay, &majorVersion, &minorVersion);

eglGetConfigs(eglDisplay, NULL, 0, &numConfigs);

eglChooseConfig(eglDisplay, configAttribs, &eglConfig, 1, &numConfigs);

ANativeWindow* anw = CameraHardwareInterface::getANativeWindow(mNativeWindow);

eglCreateWindowSurface(eglDisplay, eglConfig, anw, NULL);

eglContext = eglCreateContext(eglDisplay, eglConfig, EGL_NO_CONTEXT, contextAttribs);

eglMakeCurrent(eglDisplay, eglSurface, eglSurface, eglContext);

eglQuerySurface(eglDisplay, eglSurface, EGL_WIDTH, &w);

eglQuerySurface(eglDisplay, eglSurface, EGL_HEIGHT, &h);

4. Map capture buffers to the GPU textures:

#define GL_GLEXT_PROTOTYPES

#define EGL_EGLEXT_PROTOTYPES

#include <EGL/egl.h>

#include <EGL/eglext.h>

#include <GLES2/gl2.h>

#include <GLES2/gl2ext.h>

EGLint imageAttrs[] =

{

EGL_IMAGE_PRESERVED_KHR, EGL_TRUE,

EGL_NONE

};

EGLImageKHR eglImage[6];

GLuint texName[6];

glGenTextures(6, &texName[0]);

for (int i = 0; i < 6; i++)

{

eglImage[i] = eglCreateImageKHR(eglDisplay, EGL_NO_CONTEXT, EGL_NATIVE_BUFFER_ANDROID, (EGLClientBuffer)anb[i], imageAttrs);

glBindTexture(GL_TEXTURE_2D, texName[i]);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

glEGLImageTargetTexture2DOES(GL_TEXTURE_2D, (GLeglImageOES)eglImage[i]);

}

5. Load shader program:

This is done in the standard way. Threre are examples in android code.

6. On each captured buffer render in a standard way:

Standard OpenGL ES code except need to pay attention to the following:

- get the index of the currently captured buffer in the int CameraHal ::captureframeThread():

...

case CMESSAGE_TYPE_NORMAL:

ret = mCaptureDevice->DevDequeue(&bufIndex);

- start rendering code with the following:

glBindTexture(GL_TEXTURE_2D, texName[bufIndex]);

- setup geometry, uniforms, etc

- setup render target:

glBindFramebuffer(GL_FRAMEBUFFER, 0);

glViewport(0, 0, w, h);

- render

- swap buffers

eglSwapBuffers(eglDisplay, eglSurface);

7. Skip code in CameraHal.cpp that submits captured buffers to the preview window and only re-queue buffer to the camera device:

Upon completion of the step 6, in int CameraHal ::previewshowFrameThread(), make sure to execute only this code and skip the rest which

does mNativeWindow->enqueue_buffer() and mNativeWindow->dequeue_buffer():

case CMESSAGE_TYPE_NORMAL:

...

buf_index = display_index;

pInBuf->buf_state = WINDOW_BUFS_QUEUED;

mEnqueuedBufs ++;

mCaptureBuffers[buf_index].buf_state = WINDOW_BUFS_DEQUEUED;

ret = putBufferCount(&mCaptureBuffers[buf_index]);

break;

8. Upon camera termination, clean up allocated buffers and EGL context

This is standard code

dinhbang · ‎03-17-2014

Dear Mr Dragan,

Do you guide more detail step 2,3,4,5? I don't known location you add new code, My camera is output raw format, and i can't preview it with Ipu of imx6, my target is android ICS. Please help me solve this issue. Thank you very much!

DraganOstojic · ‎03-18-2014

Hi Dinh Bang, I can help you out getting GPU working.

Do you have a gray scale or color Bayer raw format camera?

dinhbang · ‎03-18-2014

Hi Mr Dragan,

My camera is a mipi camera 13mpx, it is only output RAW10 format and i have configured camera and edit file in kernel like here Comparing johnweber:wandboard_imx_3.0.35_4.0.0...mcmordie:wandboard_imx_3.0.35_4.0.0 · mcmordie/linu... . I use generic mode to receive input bayer raw data to memory because IPU don't support converting RAW 10 format to RGB/YUV on the fly , I found your discussion but i don't know location you add new code configure for GPU in HAL. Do you guide more detail step 2, 3, 4, 5? Sorry for my english, thank you!

DraganOstojic · ‎03-18-2014

Hi Dinh Bang, if you have your mipi capture working that's great and it's not difficult to get GPU working. Just to let you know, I didn't do de-mosaicing because my camera was gray scale so I can't help you with the actual de-mosaicing algorithm.

I'm not in office right now, but tomorrow I'll let you know the details.

dinhbang · ‎03-18-2014

Thanks for your help, I will refer your way to apply for my problem.

julienheyman · ‎08-07-2013

Hi Dragan,

I am quite interested to hear the specifics of how you setup your image chain to achieve this. I am especially interested in these two aspects:

- pixel shader to perform Bayer demosaicing on the GPU

- how to map capture buffers directly in GPU memory space

Any info you are willing to share on these will be greatly appreciated.

Regards,

Julien

DraganOstojic · ‎08-07-2013

Julien, I'll put steps and some code. In the meantime with regards to your questions, I used vivante specific extensions to map capture buffers into GPU space. Android ics version of these extensions as implemented in the GPU driver (imx6 13.4.1 + 13.4.1.04 patch) appears to support only YUV formats. If you want to work with ics you have to do some tricks to correct for that. In addition, once mapped as texture, GPU color space converts on the fly so your pixel shader actually samples rgba texture. In my case I capture gray scale camera as raw 8-bit so I had to resort to tricking GPU that I captured YUV planar format. I allocated 1.5x more memory and pre-filled UV planes with 128. Pre-setting U and V with 128 will cancels out U and V once YUV->RGB conversion takes place (U/V appear as c1*(U-128) and c2*(V-128) in RGB components) you will get 3 identical values in RGB positions. They still will not be quite right because color space conversion from YUV to RGB transforms Y to c*(Y-16). I accomplished this last correction step in my shader code. Once I get my original Y I'm done. In your specific case you may proceed to do de-mosaicing. I came accross this shader demosaicing code so you may want to take a look:

http://graphics.cs.williams.edu/papers/BayerJGT09/

julienheyman · ‎08-07-2013

Good info, thank you. I'll study the paper you provided, and look forward to your code snippets.

Regards,

Julien

DraganOstojic · ‎08-15-2013

Note on the shader code for gray scale camera:

Assumption is that the YUV->RGB conversion formula is as follows:

B = 1.164(Y - 16) + 2.018(U - 128)

G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128)

R = 1.164(Y - 16) + 1.596(V - 128)

U/V terms will cancel out because these values are set to 128 prior to starting capture and they don't change because gray scale camera input is captured only in Y plane.

To recover original gray scale 8-bit value which came into Y position, in the pixel shader code following need to be done:

vec4 tex = texture2D(my_Sampler, vTexcoor);

float y = tex.x * (1.0/1.164) + (16.0/255.0);

gl_FragColor = vec4(y, y, y, 1.0);

This approach works for gray scale but probably not so well for bayer color but I think it could be solved.

julienheyman · ‎08-30-2013

Hi Dragan,

I put this method to practice, and the trick is great, however there seems to be one problem: all input Y values lesser than 16 will be clamped to zero by the hardware conversion (as per the formula you mention, and this is what I observe anyway when retrieving the output pixels). For similar reasons, all input values greater than 235 will end up being clamped to 235 in the output too.

Did I miss something ? Are these artefacts acceptable in your case since they are not that visible to the naked eye on "usual" images ?

On a different note, since I render into an EGL pixel buffer, I also still struggle with the issue of the performance of the required glReadPixels. Now that the input transfer is much faster thanks to the Vivante direct mapping extension, this output transfer is the bottleneck (by far) of the performance. Are you aware of any interesting options to get the output pixels into a regular/user memory buffer without going through glReadPixels ?

Thanks !

DraganOstojic · ‎08-30-2013

Hi Julien, thanks for noticing about clamping, it appears that's expected when you do YUV->RGB conversion. However, in my case this loss of precision is to be avoided so I'll need to reconsider YUV->RGB approach when handling gray scale camera input.

I guess you're rendering into pixel buffer so that you can do your de-mosaicing and copy into memory after that for display? To avoid lengthy glReadPixels operation (which I don't believe has an efficient substitute on vivante GPU), you could do your shading in 2 passes. First pass will render into texture (attached to FBO) where you can do your de-mosaicing and 2nd pass from texture into native window for display. Do you need a code for that? I measured performance of that setup and rendering 2592x1944 input -> 2592x1944 texture -> 640x480 native window takes ~70ms/frame (for my grayscale processing which is essentially expanding Y8 to RGBA).

If you just want to put your final image into memory there is a way but I haven't tried it. It's covered in these threads:

https://community.freescale.com/thread/303338

https://github.com/Freescale/linux-module-virtfb

https://community.freescale.com/thread/309677

https://community.freescale.com/message/292215#29221

julienheyman · ‎09-01-2013

Hi Dragan,

I will probably have to drop the DirectVIV trick too since the YUV/RGB clamping is not ok for me either. Too bad, but anyway in the meantime I realized that getting the output data is my real bottleneck (by far). I won't display the image, I just need to put my converted image in memory, and was planning on investigating the virtual framebuffer option, so thank you for the associated pointers.

DraganOstojic · ‎08-15-2013

I found an issue with vivante texture map functions when front and back facing cameras are switched so I changed approach to what Android is using in surface texture code. Performance is not as good as with vivante functions but still OK.

Ivan_liu · ‎07-17-2013

hi, Dragan

Camera hal in ics 13.4.1 has CaptureFrameThread, PreviewShowFrameThread, EncodeFrameThread and TakePicThread.

The CaptureFrameThread acquires buffer from V4L2 interface and post it to PreviewShowFrameThread, EncoderFrameThread or TakePicThread acoording to requirement.

You may add another thread PostProcessThread to handle buffer from CaptureFrameThread.

Then PostProcessThread deliver the handled buffer to PreviewShowFrameThread, EncodeFrameThread or TakePicThread according to requirement.

All threads share the buffer, which managed by buffer reference count.

You may refer the CameraHal code in hardware/imx/mx6/libcamera.

And you may also refer the PreviewShowFrameThread to PostProcessThread.

BRs,

Xiaowen

How to insert GPU shading between camera capture and preview window

How to insert GPU shading between camera capture and preview window

Android

Graphics & Display

i.MX6_All

i.MX6Quad