[OpenGL] Texel fetch takes too much time on imx6 GPU (GC2000)

Showing results for 
Search instead for 
Did you mean: 

[OpenGL] Texel fetch takes too much time on imx6 GPU (GC2000)

Contributor III

I have a question regarding opengl es 2.0 in i.mx6 quad.

I have an opengl app doing multiple texel fetches (upto 12) for calculating each pixel in the fragment shader code.

The fragment shader code is as follows:

varying vec4 gh_TexCoord;

uniform sampler2D source;

void main(void) {

     vec4 tmp1;

    vec4 tmp2;

    tmp1.b = float(texture2D(source,vec2(gh_TexCoord)).r);

    tmp1.g = float(texture2D(source,vec2(gh_TexCoord)).g);

    tmp1.r = float(texture2D(source,vec2(gh_TexCoord)).b);

    tmp1.a = float(texture2D(source,vec2(gh_TexCoord)).a);

    tmp2.b = tmp1.b + float(texture2D(source,vec2(gh_TexCoord)).r);

    tmp2.g = tmp1.g + float(texture2D(source,vec2(gh_TexCoord)).g);

    tmp2.r = tmp1.r + float(texture2D(source,vec2(gh_TexCoord)).b);

    tmp2.a = tmp1.a + float(texture2D(source,vec2(gh_TexCoord)).a);

    gl_FragColor.b = tmp2.b + float(texture2D(source,vec2(gh_TexCoord)).r);

    gl_FragColor.g = tmp2.g + float(texture2D(source,vec2(gh_TexCoord)).g);

    gl_FragColor.r = tmp2.r + float(texture2D(source,vec2(gh_TexCoord)).b);

    gl_FragColor.a = tmp2.a + float(texture2D(source,vec2(gh_TexCoord)).a);


As you can see, I have done multiple texture fetches and simple additions to calculate the final output color. This is just an example and does not do anything useful. The gh_TexCoord is a varying which is used in the vertex shader to calculate the position of the pixel. the vertex calculation is also straight forward and does not involve any complex calculations. I have setup the vertex data as 4 points such that I can call glDrawArrays to draw a triangle fan to form a rectangular plane. The size of the output buffer is set as 1920x1080.

My issue here is that the texture2D calls take a lot of time for processing on the imx6 GPU. If I comment out a few of the above texture2D calls, the draw time is considerably reduced. In fact, each texture2D call cost me about 3 ms of processing time per frame, which appears to be too much. What could be the reason for this? Is this because of the low cache memory in the vivante GPU? Im fairly new to opengl, so any suggestions are welcome. If you need any more info for debugging, I can provide them as well.

Additional info :


no. of texture2D calls per frameglDrawArrays + glFinish time (ms) for full HD frame

Board : i.MX6 SabreLite quad from boundary devices.

kernel : 3.0.35

Galcore version :

Labels (3)
0 Kudos
6 Replies

Contributor III

The texture access are not even dependant in the actual shader that I am using. They are all unique and the coordinates are calculated in the vertex shader stage itself and passed as varyings. So they should be prefetched when the fragment shader is executed. But they are not. This is why I tested by accessing the pixels at the same coordinate, repeatedly. Even though the same pixels are fetched multiple times, they are not cached and adds more processing time.

Has anybody else faced any issues like this while running shaders with lots of texture access? It would be very helpful if someone could share their app's perfomance data so that i can use them as a benchmark.

0 Kudos

NXP TechSupport
NXP TechSupport

Hi Dilip,

Main reason is that you are using a older gpu buggy driver version, we strongly suggest to upgrade to latest BSP and gpu driver version. Remarkable things for this case In latest drivers are:

- General: cache operation have been corrected when doing textures

- Kernel: Disable non-paged memory cache as it is not used by command buffers now.

- Kernel: Refine MMU cache flush implementation. Query flag with command->mutexQueue acquired to avoid race condition.

- General: Add dynamic stream cache support.

- EGL: Correct error conditions in glEGLImageTargetRenderbufferStorageOES and  glEGLImageTargetTexture2DOES.

-  OGL: Add missing surface resolve in glFramebufferTexture2DOES().

- EGL: Improve PIXMAP rendering performance by resolving PIXMAP to texture directly

- Confirm a texture bound to the framebuffer is correct when it is shared with other context.

- Adjusted an optimization for texture 2D EGL image

- Fixed bugs when resolve to pixmap and Refine glFinish() for stability.

Unfortunately there is no any patch for 3.0.35 about this fixed on gpu driver. Please give a try to 3.14.28 and latest gpu driver

Hope this helps

0 Kudos

Contributor III

Hi Alfred Bio_TICFSL,

Im using the latest kernel 3.14.28 with galcore version now. But i'm afraid there is still no difference in performance. Could there be something wrong with my shader code? What else could be the reason for this drastic decrease in performance with each pixel fetch from the texture? Are there any optimizations that I could try to acheive better performance for pixel fetch?

0 Kudos

Contributor III

How do I use the APIs fbGetDisplayByIndex and fbCreateWindow in the 3.14.28 version of BSP? I tried linking the opengl ES 2.0 application against the fsl-image-gui-x11-imx6qdlsolo rootfs as well as fsl-image-gui-fb-imx6qdlsolo rootfs. They both generate an implicit declaration warning. But I see these APIs being used in the gpu-sdk-2.1. Maybe I am missing some compiler flags to enable it? I already tried -DEGL_API_FB. But that doesnt seem to help.

0 Kudos

Contributor III

Im sorry about the above post. I compiled against the libEGl in fsl-image-gui-fb-imx6qdlsolo rootfs and tried to execute on the fsl-image-gui-x11-imx6qdlsolo rootfs. This was causing the API to not work. I swiched the rootfs in the device and now I'm able to use the APIs and the opengl app is working fine. I'll update with the performance difference in a while.

0 Kudos

Contributor III

Thank you for the info. I'm trying out the fsl-L3.14.28_1.0.0_iMX6qdls_Bundle images on a sabre-sdb board now. I'll let you know when i have any updates.

0 Kudos