Bug in Vivante i.MX6 Wayland drivers when destroying windows

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Bug in Vivante i.MX6 Wayland drivers when destroying windows

9,993 Views
cola
Contributor I

We are getting the following client-side crash with our compositor when destroying windows. Apparently, the proxy object is nullptr though it never shall be.

We are using SDK release 5.0.11.p8.4.

  1. Thread 2 (Thread 3662.3673):
  2. #0  wl_proxy_add_listener (proxy=0x0, implementation=0x74b621cc <gcsWL_FRAME_LISTENER>, data=data@entry=0x5e424d0c) at /usr/src/debug/wayland/1.9.0-r0/wayland-1.9.0/src/wayland-client.c:464
  3. No locals.
  4. #1  0x74b53544 in wl_callback_add_listener (data=0x5e424d0c, listener=0x74b621cc <gcsWL_FRAME_LISTENER>, wl_callback=<optimized out>) at /home/bamboo/automation/3.14.52-1.1.1/graphics_pkg/temp_build_dir/build-imx6qsabresd/tmp/sysroots/imx6qsabresd/usr/include/wayland-client-protocol.h:317
  5. No locals.
  6. #2  gcoOS_SetDisplayVirtualEx (Display=<optimized out>, Window=0x5e424c84, Context=0x5e424d0c, Surface=<optimized out>, Offset=0, X=0, Y=0) at gc_hal_user_wayland.c:1692
  7.         swapInterval = -1
  8.         ret = <optimized out>
  9.         i = <optimized out>
  10.         wl_window = 0x5e424c84
  11.         egl_buffer = 0x5e424d0c
  12.         wl_buffer = 0x5e457d40
  13.         display = 0x19d608
  14. #3  0x74a843ac in veglSetDisplayFlip (Display=Display@entry=0x19d6bc, Surface=<optimized out>, BackBuffer=BackBuffer@entry=0x5e4255dc) at gc_egl_platform.c:249
  15.         status = <optimized out>
  16. #4  0x74a7ec6c in veglSwapWorker (Display=0x19d6bc) at gc_egl_swap.c:741
  17.         display = 0x19d6bc
  18.         displayWorker = 0x5e4255cc
  19.         currWorker = 0x5e4255cc
  20.         bStop = 0
  21.         __user_ptr__ = <synthetic pointer>
  22. #5  0x75a4cf5c in start_thread (arg=0x65556440) at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_create.c:335
  23.         pd = 0x65556440
  24.         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {2123239463, 1853849923, 1700095040, 2130704288, 0, 338, 0, 0, 2130704288, 1700093820, 0 <repeats 54 times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
  25.         not_first_call = <optimized out>
  26.         robust = <optimized out>
  27.         pagesize_m1 = <optimized out>
  28.         sp = <optimized out>
  29.         freesize = <optimized out>
  30.         __PRETTY_FUNCTION__ = "start_thread"
  31. #6  0x75d42408 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:86 from /opt/sdk/cebis/sysroots/cortexa9hf-neon-mel-linux-gnueabi/lib/libc.so.6
  32. No locals.
  33. Backtrace stopped: previous frame identical to this frame (corrupt stack?)
0 Kudos
29 Replies

6,043 Views
chingling_wang
NXP Employee
NXP Employee

Can you share me how to set up these environment?  I don't know how to make both eglfs and wayland plugin working at the same time. I can only test in our available environment.   Or you can give me your images so that we can concentrate on gpu side.  It is always painful experience to build user flavored image by myself,  usually lots of failures to overcome every time.

And,  what did you see when you do dmesg |grep galcore

Thanks

0 Kudos

6,043 Views
cola
Contributor I

chinglingwang‌, actually you do not have to setup both, since the QtWayland Compositor from the example is a fully self-contained Wayland compositor that creates the wayland-0 socket. However, the QtWayland Compositor renders on EGL via the Qt platform abstraction EGLFS.

So, the setup is as follows:

1. disable Weston (or any other compositor) that is running on your image

2. run the compositor from the example with the parameter "-platform eglfs" (note the single dash; there is a typo in Sanjeev's post); this command then creates a Wayland compositor and you should see a colored background; this compositor creates the default Wayland socket to which Wayland client applications can bind

3. run the client application with the parameter "-platform wayland", which then binds to the default Wayland socket that is created by the compositor

Moreover, check your Qt configuration that actually "eglfs" option is enabled for your build.

0 Kudos

6,043 Views
chingling_wang
NXP Employee
NXP Employee

I tried stop weston

systemctl stop weston

Then I run XDG_RUNTIME_DIR=/var/run ./compositor -platform eglfs &, I still got error.  This application failed to start because it could not find or load the Qt platform plugin "eglfs"

I also tried  export QT_QPA_PLATFORM=eglfs, the same error.

It looks like eglfs option is not enabled in our build,  I experienced so many "not accessible" error when I build my own yocto,

And it may seems silly, I don't know to enable both eglfs and wayland in yocto build.

Could you share me your image so that we can concentrate on gpu side.  I don't think it is good idea to take pains to set up your enviroment.

Thanks

0 Kudos

6,043 Views
sanjeevsharma
Contributor IV

you mean to say complete yocto image environment ?

@Andreas Cord-Landwehr please provide the image, if possible.

0 Kudos

6,043 Views
cola
Contributor I

We have to look at some legal topics to check if that is possible.

0 Kudos

6,043 Views
chingling_wang
NXP Employee
NXP Employee

Yes.  

In our build, we only have wayland qt5 plugin, not eglfs.

eglfs plugin is for FB backend. wayland interfaces frame buffer through weston compositor which makes more sense. 

If both eglfs and wayland exist, there are two processes that will interact with frame buffer, it doesn't make sense. 

Do you have any special reason to run your application using eglfs plugin and wayland plugin?  You cannot run all applications in wayland plugin?

Since we don't build this way,  you need either to show me how to enable both in yocto build or giving the sd card image you are using so that I can reproduce the issue.

Thanks

0 Kudos

6,043 Views
cola
Contributor I

chinglingwang‌,  I think there is a big misunderstanding. Let me try to explain the big picture.

The scenario, we are looking at:

1. there is a device with i.MX6 that provides

2. there are multiple applications with multiple windows that we want to render. Thus, we need a compositor to compose them next to each other. With today's technologies it only makes sense to use a Wayland compositor (no X11).

3. there are certain functionalities that we want to have in the compositor, which are not supported by Weston. That is the reason why we are NOT using Weston. Instead, we are using ANOTHER Wayland compositor (note that Weston is not the only Wayland compositor, but one of many different existing ones). Our choice is to use the QtWayland Compositor (actually, we are extending the basic QtWayland compositor with the features we need).

Now, the graphics stack is as follows:

- Vivante provides EGL

- the QtWayland compositor directly renders at EGL via EGLFS; QtWayland compositor acts as the Wayland compositor for the system

- the applications let their windows be rendered by the QtWayland compositor

Does that make sense for you or do you have any problems with that?

There are many projects around that have the stack exactly like this.

The following two packageoptions for the qtbase Yocto recipe should activate the eglfs support:

PACKAGECONFIG_GL = "gles2"
PACKAGECONFIG += "eglfs"

You can check that the options are correctly picked up by looking into the config.summary file in the Yocto build folder for package qtbase.

Best regards,

Andreas

0 Kudos

6,044 Views
sanjeevsharma
Contributor IV

Hello All,

is anyone from NXP who can look and help in this issue.

Regards

Sanjeev Sharma

0 Kudos

6,044 Views
PrabhuSundarara
NXP Employee
NXP Employee

Hi Sanjeev,

Could you please try with our latest 6.2.4P1 release with weston 4.0. There are lot improvements in the EGL related to buffere management.

may be you can send the test binary it will be quick to test.

0 Kudos

6,044 Views
sanjeevsharma
Contributor IV

Hi Prabhu,

Did you get time to execute test binary on 6.2.4P1 release ?

I would really appreciate your feedback here.

Regards
Sanjeev Sharma

0 Kudos

6,045 Views
chingling_wang
NXP Employee
NXP Employee

Hi, Sharma,

How can I unzip the attached binary?  tar, bzip2 and window unzip are not working.  

I can clone your gitHub,  But, could you give me the instructions how to build it for imx6q?  

Thanks

0 Kudos

6,045 Views
sanjeevsharma
Contributor IV

Hi, Wang,

i am attaching unpack binaries for your reference. please try it and let me know.

Regards

Sanjeev Sharma

0 Kudos

6,045 Views
chingling_wang
NXP Employee
NXP Employee

Are your running commands:

XDG_RUNTIME_DIR=/var/run compositor -platform eglfs  &

XDG_RUNTIME_DIR=/var/run client -platform wayland ?

This is wayland backend, it cannot find or load eglfs plugin,  I got error.  And 

I tried this way

./compositor   -platform wayland   &

./client   -platform wayland

When I touched red rect, a yellow rect appears,  no crash.  The gpu I used is 6.2.4.p2,  I used the latest sum xwayland sd card image.

0 Kudos

6,043 Views
sanjeevsharma
Contributor IV

Hi Wang,

From where we can download gpu version 6.2.4.p2.

0 Kudos

6,043 Views
chingling_wang
NXP Employee
NXP Employee

I can only run this way:

./compositor  &

./client.

my system cannot recognize XDG_RUNTIME_DIR=/var/run,  and default qt plugin is wayland,   --platform eglfs will return error.

I ran my way, yes I saw red rectangle, touch it, got green one, touch the green one, it became yellow and disappear.  I tried many times, no crash.

p2 is not released yet,  date is Nov 14..  If you cannot wait,  I can try to see if there are patches to fix this issue and built gpu p1 with patches. If this not working,  I can build gpu p2 driver binaries and gave p2 gpu driver kernel code so that you can replace gpu libs and rebuild your kernel image with gpu kernel p2.

 

I just tried p1, it also works OK, no seg fault. Behaves the same as p2.

the gpu version: Galcore version 6.2.4.150331

And, BSP is  NXP i.MX Release Distro 4.9.88-2.1.0 imx6qpdlsolox ttymxc0.

Where do you download the BSP with p1?

0 Kudos

6,046 Views
sanjeevsharma
Contributor IV

Really appreciated your prompt reply.

Still your and our environment is not the same to reproduce the problem.

As i said, compositor binary, you are running is own created compositor using Compositor API based on QtWayland Qt 5.9.This represent the server side of Wayland. These type of use case do not use the wayland platform plugin, Generally this is going to be eglfs,however to reproduce the problem, please enable  --platform eglfs at your end to run the compositor.

In your case, you are running a nested compositor because you are running  QT wayland compositor binary with the wayland platform plugin along with default Weston compositor in background.

In our case we don't use default Weston compositor,instead we are using own created Wayland compositors based on 5.9.

0 Kudos

6,048 Views
sanjeevsharma
Contributor IV

Thanks Wang,  you have to run client with wayland plugin only.

## Test Setup
- run compositor:

XDG_RUNTIME_DIR=/var/run compositor -platform eglfs  &

- run client: 

XDG_RUNTIME_DIR=/var/run client -platform wayland

Note: Compositor is based on QtWayland Qt 5.9.

## Trigger the Crash
- touch on the red window creates another green window
- touch an a green window destroys the latest green window

For performing the crash, just click on the red window, then the green (if it did not crash yet, repeat this).
After a couple of attempts, I get the crash.

We have tried 6.2.4P1 at our end. Compositor runs fine but client stuck after clicking 1 or 2 iteration, we don't see any crash but client is not taking more clicks(input).

0 Kudos

6,047 Views
sanjeevsharma
Contributor IV

Thanks Prabhu,

please find attached test binaries.

#XDG_RUNTIME_DIR=/var/run compositor -platform eglfs #XDG_RUNTIME_DIR=/var/run client -platform wayland 

By iterative clicking with your finger at the red rectangle and the created yellow rectangle, you can reproduce the crash.

Note: compositor is  based on QtWayland.

Thanks in advance.
0 Kudos

6,047 Views
chingling_wang
NXP Employee
NXP Employee

Hi, Sharma,

How can I unzip the attached binary?  tar, bzip2 and window unzip are not working.

Thanks

0 Kudos

6,048 Views
sanjeevsharma
Contributor IV

I discussed this issue with QT Community on #IRC channel(qt-labs) and described them this issue in detail. As per Thiago Macieira(Software Architect at Intel; Open Source advocate), The bug is not in Qt Code. GPU Vivante driver seems has some issues. This is caused by a race condition between the main thread and thread 2 (the thread that crashed).  Thread 2 looks like it's about to destroy the EGLSurface for this wl_surface which already got destroyed)

He pointed out that "Main thread is doing a usleep() indicates that this software package has issue"

In backtrace he pointed out below section of code snippet which creating issue, mainly gcoOS_Delay which is highlighted.

#1 0x75dc7420 in usleep (useconds=useconds@entry=10000) at /usr/src/debug/glibc/2.24-r0/git/sysdeps/posix/usleep.c:32
ts =

{tv_sec = 0, tv_nsec = 10000000}

#2 0x755d7574 in gcoOS_Delay (Os=Os@entry=0x0, Delay=Delay@entry=10) at gc_hal_user_os.c:4033
_user_ = 1 '\001'
_user_ptr_ = <synthetic pointer>
#3 0x75503cf8 in _DestroySurfaceObjects (Thread=Thread@entry=0x3e2f4, Surface=Surface@entry=0x6636bc7c) at gc_egl_surface.c:1123
i = <optimized out>
status = gcvSTATUS_OK
#4 0x75503f80 in _DestroySurface (Thread=Thread@entry=0x3e2f4, Surface=Surface@entry=0x6636bc7c) at gc_egl_surface.c:2316

In my opinion,We can't insert delays/sleeps in our code, if we need to support multi-threading properly. since the crashing thread is running software from the same package that isn't doing multi-threading correctly, So problem looks like improper multi-threading.

Any input would be highly appreciated.

0 Kudos