i.MX6: camera image processing with 16 bit per color channel (IPU->GPU->VPU) under Linux

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

i.MX6: camera image processing with 16 bit per color channel (IPU->GPU->VPU) under Linux

10,211 Views
markushalla
Contributor I

Hi,

I am planning to build a video camera solution based on the i.MX6 with the following data flow:

  1. Image capture (12 bit per pixel RAW Bayer Data from parallel CSI interface) via IPU in generic mode to memory
  2. Image processing (noise reduction, defect correction, Bayer demosaicing and so on) via OpenGL or OpenCL in GPU
  3. Image projection for lens distortion correction with OpenGL in GPU
  4. Video compression of the projected frames in the VPU

The RAW image data has 5 MP with a framerate of 15Hz. So the data transfer and processing time could be critical.

As I have seen in different other discussions there are solutions for fast data mapping from IPU to an OpenGL texture. Also there is the possibility for fast data transfer from GPU to VPU via virtual framebuffer devices.

I do not want to use the IPU integrated Bayer demosaic filter as it only use 10bit input data and compress this also to 8bit before I can make further correction steps.

But when I interpret the OpenGL ES 2.0 capabilities of both GPU versions correctly, only textures with 8 bit per color channel are supported. I would need something like the GL_L16 and GL_RGB16 types.

With OpenCL I would not have the problem with the texture data format. But there seems to be no possibility for a fast data transfor from IPU to OpenCL and from OpenCL to OpenGL.

Are my assumptions correct? Do you see any possiblity how I can solve this problem? With 8 bit per color channel there seems to be a fast and straight forward solution.

Many thanks,

Markus

Labels (4)
11 Replies

2,616 Views
ghazal4usa
Contributor I

Dear Markus Halla,

Finally, I am now able to use GPU and VPU simultaneously. That was a combination of techniques and already prepared codes.

I have written an example code in eclipse ide and attached it for the community members to download and use as a start point.

The program attached, decodes a raw h.264 movie with the powerfull VPU decoder and displays it by Vivante GPU. The performance is awesome!. Give it a raw h264 encoded movie and process it in gpu fragment shader or by the use of opencl if you need the result of processed frames.

Display a 1080p frame in just about 17ms! (really zero copy). It has taken from mxc_vpu_test code in imx6 kernel. So with a few modification, you will be able to get feed from camera and the rest of the path is straightforward.

Sincerely yours,

Ghazal.

2,616 Views
qqk
Contributor I

hi, Ghazal

 

I've reviewed your Direct_Texture_Official_OK codes,  it is very helpful for me to know more VPU and GPU programming.
In your files, there is a source codes in function RenderInit() as below:

LoadShaders("/home/root/vs_es20t5.vert", "/home/root/ps_es20t5.frag");

But I can't find any files as vs_es20t5.vert and ps_es20t5.frag in your project.
I think they are OpenGL ES 2.0 shaders, are they?

Could you share them for me ?
thank you.

0 Kudos
Reply

2,616 Views
ghazal4usa
Contributor I

Dear Markus Halla,

Finally I am now able to use GPU and VPU simultaneously. That was a combination of techniques and already prepared codes.

It decodes a raw h.264 movie with the powerfull VPU decoder and displays it by Vivante GPU. The performance is awesome!. Give it a raw h264 encoded movie and process it in gpu fragment shaders!.

I have written an example code in eclipse ide and attached it for the community members to download and use as a start point.

Display a 1080p frame in just about 17ms! (really zero copy). It has taken from mxc_vpu_test code in imx6 kernel. So with a few modification, you will be able to get feed from camera and the rest of the path is straightforward.

Sincerely yours,

Ghazal

2,616 Views
davemcmordie
Contributor III

Hi Markus,

Suggest you read the following thread-- it contains details about how to do VPU processing on Android between capture and IPU:

https://community.freescale.com/thread/309817

I am curious about the IPU-integrated de-mosaic filter that you refer to.  Do you have details on this?  Various people have told me that the IPU does not support demosaic operations on the IMX6.  I believe GPU is the only non-CPU intensive option.

I haven't implemented GPU processing yet, so I can't comment on the transfer bottlenecks, but I believe that will be an issue.  I would be very interested to hear how you get on-- please post back when you've made some progress.

Best,

Dave

2,616 Views
markushalla
Contributor I

Hi Dave,

thanks for the link to the discussion thread. I read it already before. But now I figured out that there is no good quality method to transfer Bayer data with the "glTexDirectVIV" function to the GPU. The method mentioned in https://community.freescale.com/thread/309817 works. But only for 8bit Bayer data with the drawback of clipping values below 16.

As I understand the manuals, and according the community posts, the IPU can demosaic 10bit Bayer data to 8bit RGB data. See https://community.freescale.com/message/309833#309833

I plan now the following data flow:

  1. Image capture (12 bit per pixel RAW Bayer Data from parallel CSI interface) via IPU in generic mode to memory with 16bit per pixel data format
  2. Data transfor by CPU from memory to OpenCL-GPU
  3. Image processing (noise reduction, defect correction, Bayer demosaicing and so on) via OpenCL in GPU and conversion to RGBA8888 data format
  4. Data transfor from OpenCL via CPU directly into OpenGL texture with the "glTexDirectVIV" function
  5. Image projection for lens distortion correction with OpenGL in GPU to a virtual framebuffer
  6. Video compression of the virtual framebuffer in the VPU

What I have to clarify is, if OpenGL can directly render YUV4:2:0 data to the framebuffer or if I need an extra conversion step between.

Regards,

Markus

0 Kudos
Reply

2,616 Views
YixingKong
Senior Contributor IV

Markus

This discussion is closed since no activity. If you still need help, please feel free to reply with an update to this discussion, or create another discussion.

Thanks,

Yixing

0 Kudos
Reply

2,616 Views
YixingKong
Senior Contributor IV

Markus

Had your issue got resolved? If yes, we are going to close the discussion in 3 days. If you still need help please feel free to contact Freescale.

Thanks,
Yixing

0 Kudos
Reply

2,616 Views
davemcmordie
Contributor III

Hi again Markus,

I just read through that thread about four more times because you gave me a glimmer of hope on the 8-bit debayer.  The official word from Freescale is that the IPU used an Image Signal Processor to do the debayering on the i.MX5 and that the module was removed from the i.MX6.  The bottom line is that it cannot be done; I have a feeling this is because the IPU cannot buffer and handle more than one line of image data at a time (but this is just conjecture).  

I have implemented a basic block debayer in OpenCL, but it is nowhere near fast enough yet.  Debayering is turning into a severe limitation on these processors...

The only stone I have left unturned so far is using OpenGL ES 2.0 to do the job.  It may be faster, but as I am not processing for display I will still have the slow uploads and downloads to get the data into memory.

Please post back if you make progress.

Dave

0 Kudos
Reply

2,616 Views
ilangoodman
Contributor I

Dave, we're in the same boat now trying to debayer images in OpenGL ES 2.0 on the iMX6. This is rough, since our old image sensor (which output RGB, YUV, etc) has been discontinued, and all viable alternatives only output bayer. Did you ever get this working?

0 Kudos
Reply

2,616 Views
davemcmordie
Contributor III

Hi Ilan,

We implemented a de mosaic algorithm in OpenCL.   The performance was less than spectacular and at the time it resulted in graphics driver related crashes.  We eventually shelved it in favour of a straight C++ method.  IMHO the lack of hardware demosaic on this chip is a major limitation on this platform given that it is targeted at industrial applications rather than consumer smartphones.

The bottleneck at the time was the upload and download of the buffers, which were not zero-copy dma transfers on OpenCL, but rather seemed to tie up the CPU.  My next step was going to be to migrate the code to OpenGL ES 2.0, as there was some hope of doing zero-copy transfers there.

Attached is the code we were using.  I am sure it includes dependencies you don't have, but it should be a fairly complete starting point.

Best,


Dave

0 Kudos
Reply

2,616 Views
ilangoodman
Contributor I

Thanks Dave, that's really helpful info. We're investigating the OpenGL method. We did implement in C++ but it's still too slow. I agree this is a major limitation.. In fact it would be a problem even for consumer smartphones, since most new image sensors output raw Bayer only as far as I can tell.

Ilan Goodman

Chief Technology Officer

Park Assist

0 Kudos
Reply