Bad performance IMX6DL + Wayland + QT + OpenGLES

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Bad performance IMX6DL + Wayland + QT + OpenGLES

6,606 Views
simonvanveerdeg
Contributor I

Hi,

My setup:

3.10.53 BSP (Wayland/Weston)

IMX6 DualLite

VSync enabled (To avoid tearing) (MultiBuffering)

I have created two application:

An application that uses Qt to render a 1080 texture as fast as possible. This only gets 23 fps.

An application that that do the same without Qt. This gets only 25 fps.

Labels (4)
0 Kudos
21 Replies

3,218 Views
HugoOsornio
NXP Employee
NXP Employee

Hello simonvanveerdeghem

I just got the board. Will leave a Yocto build running to be able to test tomorrow.

Cheers,

Hugo

0 Kudos

3,218 Views
HugoOsornio
NXP Employee
NXP Employee

Hey everyone, on my DL board the averages are 38fps.

Cheers,

Hugo

0 Kudos

3,218 Views
simonvanveerdeg
Contributor I

Yeah, the same like we have.

But when we are also decoding, then it will decrease around 20 fps

Cheers

Simon

0 Kudos

3,211 Views
HugoOsornio
NXP Employee
NXP Employee

Hello simonvanveerdeghem​, Rodrigue

Using the following image:

Linux imx6dlsabresd 3.14.28-1.0.0_ga+g91cf351 #1 SMP PREEMPT Mon Jun 15 12:13:50 MST 2015 armv7l GNU/Linux

I can get up to 40FPS on the i.MX6DL Sabre SD board using a single or multibuffer approach and using also the barco-wallpaper example.

199 frames in 5 seconds: 39.799999 fps

200 frames in 5 seconds: 40.000000 fps

200 frames in 5 seconds: 40.000000 fps

200 frames in 5 seconds: 40.000000 fps

200 frames in 5 seconds: 40.000000 fps

200 frames in 5 seconds: 40.000000 fps


I am attaching the application, as I made a slight modification on it: (Perhaps it won't work in your end as I used the poky 1.7 toolchain to generate ut)

On Function:

void SquircleRenderer::paint()


I modified the texture filtering scheme to nearest to avoid unnecessary texel operations.

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);

glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

Also for the VPU stuff, I am still waiting for the Barco recipes to test, but in the meantime, can you please try changing the VPU frequency to make it higher?

Enable the MX6_VPU_352M on the menuconfig

Select the MX6_VPU_352M to be YES on the menucongif.

In order to enter the menuconfig from Yocto enter

bitbake linux-imx -c menuconfig.

Symbol: MX6_VPU_352M [=n]                                               │ 

  │ Type  : boolean                                                         │ 

  │ Prompt: MX6 VPU 352M                                                    │ 

  │   Location:                                                             │ 

  │     -> Device Drivers                                                   │ 

  │       -> MXC support drivers                                            │ 

  │         -> MXC VPU(Video Processing Unit) support                       │ 

  │ (3)       -> Support for MXC VPU(Video Processing Unit) (MXC_VPU [=y])  │ 

  │   Defined at drivers/mxc/vpu/Kconfig:22                                 │ 

  │   Depends on: ARCH_MXC [=y] && MXC_VPU [=y]

0 Kudos

3,207 Views
simonvanveerdeg
Contributor I

Hello,

With MX6_VPU_352M, doesn't help to get more fps.

I see that my fps is lowered with 2, when i enable the option.

0 Kudos

3,218 Views
HugoOsornio
NXP Employee
NXP Employee

Hello guys, I just digged into the MPU specs for the Dual Lite.

You have a different GPU, my board has the GC2000 and yours has the GC880.

I will secure a Dual Lite Board and rerun my tests to check the actual difference.

Cheers,

Hugo

0 Kudos

3,218 Views
simonvanveerdeg
Contributor I

Do you already have a dual lite board?

And check the difference?

Cheers,

Simon

0 Kudos

3,218 Views
HugoOsornio
NXP Employee
NXP Employee

Hello simonvanveerdeghem

For Wayland I am using:

An i.MX6Q Sabre SD with the 3.14.28-1.0.0_ga+g91cf351 BSP release.

For FB I am using the same setup, I just compiled my Yocto image for the FB Backend, I use the eglfs platform when running the application.

./barco-wallpaper --platform eglfs

Cheers,

Hugo

0 Kudos

3,218 Views
HugoOsornio
NXP Employee
NXP Employee

Hello simonvanveerdeghem

I compiled and ran your barco-wallpaper application:

Without using the GAL2D to composite, I had 34fps running the app.

Using the GAL2D to composite I had a 43.4fps.

I am now building for FB to get the results. And, answering your question, as far as I know, if multiple Frame Buffers are created on a FB backend, the FB total size will be 3 times larger, but still contiguous, and the IPU will select which section select without additional composition steps.

Cheers,

Hugo

0 Kudos

3,218 Views
simonvanveerdeg
Contributor I

Hi Victor Hugo Osornio Lopez

What is your setup to get this higher fps?

I only get 38fps with gald2d on bsp with imx6DL and FB_MULTI_BUFFER=2

And how is your setup for the fb backend? What qpa platform are you using?

Cheers,

Simon

0 Kudos

3,218 Views
HugoOsornio
NXP Employee
NXP Employee

Hello simonvanveerdeghem

I just tested using the FB backend using the barco-wallpaper application.

It runs at 60FPS using a double buffer approach.

Cheers,

Hugo

0 Kudos

3,218 Views
rendy
NXP Employee
NXP Employee

I'm reposting some information and questions from Hugo:

I just finished running the trace tests:

Find below some details.

1.- Qt5 Applications are using stencil masks, and stencil functions in order to  draw the window outlines, borders and messages. Tomorrow I will remove the non required details to strip all unnecessary bits from our test. On the attached traces, you can see that I captured a complete frame, beginning and ending with the eglSwapBuffer function.

The Qt5 application is using a rendering window of less than the 1920x1080 resolution, 1890x820. The outlines and non necessary data is giving my traces some noise (and overhead to the GPU as well). I will recompile the Qt5 apps to remove that.

On contrast, the Application I compiled using a textured cube with a 1920x1080 runs at 42fps.

2.- Which renderer is the customer using? Are they using the GAL2D compositor or the OpenGL one, some work can be relieved off the 3D core's back by using the GAL2D compositor.

3.- Can Rodrigue provide his gstreamer application?

4.- What is the expected frame rate that we must strive for? 60fps or 30fps?

Cheers,

Hugo

0 Kudos

3,218 Views
simonvanveerdeg
Contributor I

1.- Qt5 Applications are using stencil masks, and stencil functions in order to  draw the window outlines, borders and messages. Tomorrow I will remove the non required details to strip all unnecessary bits from our test. On the attached traces, you can see that I captured a complete frame, beginning and ending with the eglSwapBuffer function.

The Qt5 application is using a rendering window of less than the 1920x1080 resolution, 1890x820. The outlines and non necessary data is giving my traces some noise (and overhead to the GPU as well). I will recompile the Qt5 apps to remove that.

On contrast, the Application I compiled using a textured cube with a 1920x1080 runs at 42fps.

The textured cube application do not change every pixel of the window.

My application uploads a texture to fullscreen window.

2.- Which renderer is the customer using? Are they using the GAL2D compositor or the OpenGL one, some work can be relieved off the 3D core's back by using the GAL2D compositor.

We are using OpenGL compositor. (use_gal2d = 0)

3.- Can Rodrigue provide his gstreamer application?

Which application do you mean?

4.- What is the expected frame rate that we must strive for? 60fps or 30fps?

We should get 50/60fps

0 Kudos

3,218 Views
Rodrigue
NXP Employee
NXP Employee

Last friday I discussed with the GPU Drivers team, Wayland is expected to be slower than rendering to the simple FB backend.

I also confirmed the slowness with full screen OpenGL applications on our end. I did notice a performance increase, however, when using the GAL2D engine as compositor and enabling 2 or 3 Frambebuffers.

Can you please execute this quick test on your end please?

/etc/init.d/weston stop

export FB_MULTI_BUFFER=2

weston --tty=1 --use-gal2d=1 &

Then run your application and compare the fps with and without the previous scheme?

What is your target Frame Rate?

0 Kudos

3,218 Views
simonvanveerdeg
Contributor I

With --use-gal2d=0 :

I get 23 fps (on bsp with the wallpaper-example)

i get 15 fps (on own build-system and own application)

With --use-gal2d=1

i get 38 fps (on bsp with the wallpaper-example)

i get 15 fps (on own build-system and own application)

For the example the performance increase. For my application not yet (i will investigate this)

It's strange that i have to set "use-gal2d=1" if i want to do 3d stuff !? 

0 Kudos

3,218 Views
HugoOsornio
NXP Employee
NXP Employee

Hello simonvanveerdeghem

Summarizing what I mentioned on the call:

Wayland display protocol requires clients to render on their own offscreen buffers and then composite those buffers on the final FB that will be shown on screen.

If you use the --use-gal2d=0 You will use the GPU3D (GC2000) to composite the final frame buffer, as you are already using this GPU3D to generate the offscreen buffers then you may over stress the GPU and make it work slower.

If you use the --use-gal2d=1 You will use the GAL2D to composite the final frame buffer, then the GPU3D will generate the offscreen buffers with 3D content, and then the GAL2D will composite them onto the final frame buffer, thus reducing the load on the GPU3D.

My next steps are:

Use your layers to replicate your application, rendy

Were you able to replicate it?

Once I replicate it, I will run GPU traces on it, so I can identify all the operations that are being queued. I am worried for the stencil masks being used.

As an additional note:

I ran the QT Logo application on a bare frame buffer backend (Without Wayland) and got 60fps. This further confirms that Wayland is inherently slower than a bare FB.

Cheers,

Hugo

0 Kudos

3,218 Views
simonvanveerdeg
Contributor I

Hello HugoOsornio​, rendy

Okay, i looking into our own application, why we don't get the same result as the example and also the implementation of buildroot, because the same example(on bsp = 38 fps and on buildroot = 30 fps). But if we can get 60 fps it would be very nice.

Your additional note: can you also run our wallpaper-example on directfb that i have send you? Do you also get 60 fps? And is this on the bsp, or how do you configured DirectFB for EGL?

Cheers

Simon

0 Kudos

3,218 Views
HugoOsornio
NXP Employee
NXP Employee

Hello simonvanveerdeghem

I should be able to run it on FB as well. However, an important remark is needed here: I am not using DirectFB I am using bare FB also known as FBDEV, this will enhance performance because it is not a window manager, all content is rendered directly to the final framebuffers instead of rendering it to surfaces or windows that are later composited.

Will provide the fps results on our thursday meeting.

Cheers,

Hugo

0 Kudos

3,218 Views
simonvanveerdeg
Contributor I

Okay,

Is that also with multi buffering? To avoid tearing?

Cheers,

Simon

0 Kudos

3,218 Views
rendy
NXP Employee
NXP Employee

Hi, what's the resolution? 1920x1080?

0 Kudos