imx-gpu-viv - Slow rendering with OpenGl Extension glTexDirectVIVMap()

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

imx-gpu-viv - Slow rendering with OpenGl Extension glTexDirectVIVMap()

3,175 Views
chl
Contributor I

imx8m-mini using imx-gpu-viv OpenGL framework.

Simple off-screen rendering using a opengl framebuffer object.

I want to be able to get direct access to the result of OpenGL rendering for further image processing by CPU and G2D GPU, without having to use glReadPixels() to copy the result to a G2D buffer.

Using the glTexDirectVIVMap OpenGL extension works but the rendering is slower than rendering to a default/internally allocated buffer, even though the memory is being allocated from the same CMA area.

If I change the texture allocation for the FBO from ..

 glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, w, h, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);

To use a buffer allocated by libg2d ...

struct g2d_buf* b = g2d_alloc(4*w*h, /*cached*/ 0);
void    *vaddr = g2db->buf_vaddr;
uint32_t paddr = g2db->buf_paddr;

glTexDirectVIVMap(GL_TEXTURE_2D, w, h, GL_RGBA, &vaddr, &paddr);
glTexDirectInvalidateVIV(GL_TEXTURE_2D);

Then rendering is approx 20% slower.

Any ideas why this would be the case?

Any ideas on a better way of getting virtual and physical addresses of the OpenGL Texture / Surface backing store?

0 Kudos
9 Replies

2,871 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

1. I build your test,  when I run it with useVIVmap,   vivtest  anyargument,  I got segmetation fault.

./vivtest  ,  it runs but didn't render anything, just print rendering time.

 

2. I also feel the failing of communication.  in your test case, no real data involved.  I don't feel it is right way to compare the rendering performance since nothing rendered.

If you really want to render something,  at least you need to copy some data to your image, like

 glTexImage2D(GL_TEXTURE_2D, 0, format, header.ImageWidth, header.ImageHeight, 0, format, GL_UNSIGNED_BYTE, planes[index]);

//something like memcpy(planes, srcdata) is needed.

The planes data needed to be copied, and the time to copy it should be counted into your rendering time.

Everything is empty, no much point to compare the performance.

 

And in my modified test, the data to glTexImage2D() is not changed,  cpu is smart using cache,  in real case, if the data is constantly changing, it will be even worse.

0 Kudos

2,871 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

In your simple application, like what I said, there is no data change for the image. The image is static. I doesn't make sense to compare these two cases.  glTexDirecVIVMap() is for dynamic frame change, you only need to map the frame with valid physical address, no need for memcpy any data to image buffer. It saves time in this way.

if you are using glTexImage2D() for dynamic frame data changing, you need memcpy for every frame.  memcpy is time consuming.  If your image data is static, it doesn't make sense to use glTexDirectVIVMap()

Regards

0 Kudos

2,871 Views
chl
Contributor I

It seems I'm failing to communicate the issue clearly.

I want direct access to the result of rendering as I need to do further image processing of this using the CPU and G2D.

Ignore the fact that in the test app the scene doesn't change - that is purely to keep the code small for the purposes of this discussion. It has no bearing on the fact that I am seeing different rendering times when rendering to an internally allocated FBO texture compared to an externally allocated FBO texture. In both cases the source textures are allocated in the same way - it is only the render surface that is allocated differently.

In a typical OpenGL app the scene is rendered directly into a DRM GEM memory buffer that is given to the display hardware driver (LCDIF on the imx8mm) for scanout - The GPU3D renders into the GEM buffer and the LCDIF reads from the same buffer.  There is no copying of the buffer. I want to access that same data without it going to the LCDIF and without having to copy it.

So I want to configure OpenGL to use a render buffer that I can get the physical address of.  I actually want both the physical address, for use with libg2d, and a virtual address so I can use NEON SIMD operations on the buffer.

Ideally I would be able to get a DMABUF fd from OpenGL for the internally allocated render buffer.  But there doesn't appear to be an OES extension for this.  So I tried allocating a DRM GEM buffer and configuring it as the render surface via the glTexDirectVIVMap() extension.  This "works", I get the results of rendering in the allocated buffer.  But it is slower (the subject of this discussion).  I suspect that there is some internal copying going on in the library or driver.  In theory this shouldn't be necessary.

NOTE: in the example code I allocate a G2D buffer rather than a DRM GEM buffer - this is simply because it makes the example smaller.  Both libg2d and Linux DRM allocate from the CMA memory.  Both schemes "work" (I get the expected rendered scene in the buffer) and both schemes suffer the same slow down.

0 Kudos

2,871 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello,

Are you able to send us the code?

Regards

0 Kudos

2,871 Views
chl
Contributor I

The following code (see github gist link) shows the problem.  It has been stripped down to make it smaller for this example, so doesn't do anything useful other than setup the Context and and run the render steps repeatedly to determine the render time.

vivtest.c · GitHub 

With no args it uses an internally allocated framebuffer.
With any arguments it uses a libG2D allocated buffer imported with glTexDirectVIVMap().  All other processing / OpenGL config is the same, the only difference is how the render buffer is allocated/imported

# ./vivtest 
frame useconds 7623, fps 131.18
# ./vivtest useVIVMap
frame useconds 13826, fps 72.33
#

The only difference between those two runs is this code (which allocates the FBO Render texture)

    if (useVIVMap) {
        struct g2d_buf *g2db;
        void *vaddr;
        uint32_t paddr;

        g2db = g2d_alloc(4*w*h, 0);
        vaddr = (uint8_t*)g2db->buf_vaddr;
        paddr = g2db->buf_paddr;
        glTexDirectVIVMap(GL_TEXTURE_2D, w, h, GL_RGBA, &vaddr, &paddr);
        glTexDirectInvalidateVIV(GL_TEXTURE_2D);
    } else {
        glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, w, h, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
    }
0 Kudos

2,871 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

Hello Chris,

With glTextDirectVIV extension it should be faster and not slower, can you share your application to see whats happens  ?

Regards

0 Kudos

2,872 Views
chl
Contributor I

What is the best way to post example code here ? (224 lines of C)

0 Kudos

2,872 Views
Bio_TICFSL
NXP TechSupport
NXP TechSupport

And How do you measure the time when rendering?

when using  glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, w, h, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);

you need to copy the data to your buffer whenever there is data change.  This should be counted to your rendering time for each frame unless you are using a static image, then you don't need to memcpy for every frame.  IF this is the case, there is no point to compare time consumed with glTexDirectVIVMap() 

glTexDirectVIVMap() is meant for the texure with data changing for every frame, for example, frame from video decoder, or frame from camera capture.  You save the memcpy  time by directly mapping the frame to texture.

glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, w, h, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL); alone doesn't handle any data transfer yet, it doesn't make sense just comparing the time consumed between these two apis.

 

If you include all the data copy in the rendering, if glTexDirectVIVMap() still consumes more time, then there is perfomrance issue with gpu,  if this is the case, you can share your application with us to investigation

Regards

0 Kudos

2,870 Views
chl
Contributor I

What you say about including memcpy time is true for source textures (inputs).

I am using glTexDirectVIVMap() to set up the render buffer (output) so that I can access the result of the GPU rendering directly without having to use glReadPixels()

0 Kudos