Segmentation fault when calling eglMakeCurrent

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Segmentation fault when calling eglMakeCurrent

Jump to solution
22,922 Views
charlesung
Contributor III

We are using OpenVG with X11 on the imx6 and we encounter a problem that eglMakeCurrent will occasionally crash with segmentation fault. Not very sure on the cause. The input parameters seem to be valid at the time of the call. Below is the call stack when it dies.

 

 

==4346== Process terminating with default action of signal 11 (SIGSEGV)

==4346==  Access not within mapped region at address 0x394

==4346==    at 0x5E9DF64: gcoSURF_ReferenceSurface (gc_hal_user_surface.c:12505)

==4346==    by 0x5F90CC7: _CreateSurfaceObjects (gc_egl_surface.c:604)

==4346==    by 0x5F917CF: veglResizeSurface (gc_egl_surface.c:1389)

==4346==    by 0x5F8D34B: veglMakeCurrent (gc_egl_context.c:2508)

==4346==    by 0x5F8DD43: eglMakeCurrent (gc_egl_context.c:2633)

 

 

I have also attached the valgrind output that shows the steps that lead to this.

Original Attachment has been moved to: eglmakecurrent_bad.txt.zip

Labels (3)
1 Solution
21,026 Views
charlesung
Contributor III

This works. Thanks.

View solution in original post

0 Kudos
Reply
67 Replies
6,826 Views
charlesung
Contributor III

The moment you change something, then the problem may not show up. As we have mentioned, it is very unpredictable.

0 Kudos
Reply
6,826 Views
chingling_wang
NXP Employee
NXP Employee

Hi, Charles,

Could you try the attached gpu driver in 4.1.15-1.1.1 release to see if you can still reproduce segmentation fault?

I cannot reproduce using the ./part3 anymore, but, I cannot see it fix the issue since it is so unpreditable.

IT is not a solution

0 Kudos
Reply
6,826 Views
charlesung
Contributor III

I will give that a try when I have a moment but it is very unlikely that the behavior will change without you guys explicitly putting in a fix for this.

By the way, there is really nothing magic about the part3 app. It is just one of the apps that shows the problem with that particular version of the driver and the qpa code. All I did when verifying this is to run all these simple Qt example apps and some of them will run into the seg fault. You can easily do the same on your end.

0 Kudos
Reply
6,826 Views
chingling_wang
NXP Employee
NXP Employee

From gdb debug trace, I caught the place where the segmentation happens. When veglResizeSurface(), it destroyed and created surface object, for some timing reason, the Surface->renderTarget is still null when it shouldn't ,  increasing the reference count of null Surface->renderTarget results in segmentation fault. 

I did change the code to try to make sure destroySurfaceObject finish before creatSurfaceOjbect,  I am not sure if my change can fix the issue.  I cannot reproduce it any more, That is why I let you try.

Can you give me the name of the simple Qt examples apps you used to run into the seg fault?  I just need easier way to reproduce the issue.

Thanks

0 Kudos
Reply
6,826 Views
charlesung
Contributor III

I was mainly using example code inside qtbase/examples/widgets/tutorials. I am assuming you already have Qt source somewhere. If not, you will have to download it first.

0 Kudos
Reply
6,826 Views
chingling_wang
NXP Employee
NXP Employee

Can you package the executalbes for me so that we can simply run it to reproduce this issue  since you already has the apps?   Right now, when I rebuilt gpu driver, the seg fault is not happening anymore.  If we canot reproduce it in a reliable way, how can we fix it?

0 Kudos
Reply
6,826 Views
charlesung
Contributor III

Just tried your driver. It still seg fault. I saw it with the concentriccircles app.

0 Kudos
Reply
6,826 Views
chingling_wang
NXP Employee
NXP Employee

I sort of know how to reproduce the issue, Just add or delete some debug info in gpu driver, it will happen finally.

I found something in the code that will lead null surface to be used in gpu code. After the change, I tried many times, so far, haven't seen the seg fault yet.

Can you build the gpu driver youself?  if so, I can give you the patch.

Anyway, I attached the driver,  you may help to try to see if you still see the issue

if you see seg fault again,

gdb  -args  your-application-name  arg1, arg2....

when gdb start,

type after(gdb) run

when seg fault,  type bt to get backtrace for me.

Thanks

6,826 Views
charlesung
Contributor III

Hi ChingLing,

Can you do us a favor? Since our actual code is based on the jethro branch, can you apply your patch on that version of the gpu driver instead and then provide us the patched driver?

Thanks.

0 Kudos
Reply
6,826 Views
chingling_wang
NXP Employee
NXP Employee

what is jethro branch?  my fix is based on our rel_imx_4.1.15_1.0.0_5.0.11.p8.3_ga tag, should be the same as you have?  where do you get the so called jethro, from yocto community?

0 Kudos
Reply
6,826 Views
sebastient
Contributor V

Jethro refers to the Yocto 2.0 release branch.  The official Freescale i.MX6 kernel for this release is 3.14.52_1.1.0_ga and the details are found in the following link.  We will required the driver to be built against this kernel version and we'll be using the X11 flavour.  If the OpenVG driver is affected we need the 2D flavour of this driver.  Thank you.

meta-fsl-arm/linux-imx_3.14.52.bb at jethro · Freescale/meta-fsl-arm · GitHub

0 Kudos
Reply
6,825 Views
chingling_wang
NXP Employee
NXP Employee

So, you updated your gpu to 8.4?  this link is upgraded to 3.14.52_1.1.0_ga release, gpu is 8.4.

I attahced the package of gpu driver with seg fault fix.

0 Kudos
Reply
6,825 Views
charlesung
Contributor III

Somehow the libGAL.so is not compatible. So what I am doing now is to replace all the .so except libGAL.so and it seems to be working.

0 Kudos
Reply
6,825 Views
chingling_wang
NXP Employee
NXP Employee

it may meas that the kernel doesn't match. for the link you gave, gpu is updated 8.4, kernel needs to be rebuilt

It seems to me your current gpu is 8.3,  when I built using that source before, you never have problem,  Now, I updated to 8.4.

0 Kudos
Reply
6,825 Views
charlesung
Contributor III

I believe it is using p7.4 according to the kernel-module-imx-gpu-viv_5.0.11.p7.4+fslc.bb.

We have tested this problem on two platforms. One on a custom board running 3.14.52_1.1.0_ga and p7.4. Then we try the same thing on a Sabre running 4.1.15-1.1.0_ga and p8.4.

So eventually, we want the fix to be on 3.14.52_1.1.0_ga and p7.4. But so far it seems to be okay running this 3.14.52_1.1.0_ga and p8.4+patch combo except the issue that I mentioned on libGAL.so.

0 Kudos
Reply
6,824 Views
chingling_wang
NXP Employee
NXP Employee

I attached gpu pakage for 7.4+seg fault patch. It is very close to 8.3, no gpu kernel difference.  8.4 has some kernel change,

we have so many releases, the  tag looks so long , I easily get confused.

21,027 Views
charlesung
Contributor III

This works. Thanks.

0 Kudos
Reply
6,824 Views
charlesung
Contributor III

I have not seen the problem so far so this is likely a fix. Thanks for looking into this. Really appreciate it.

0 Kudos
Reply
6,824 Views
charlesung
Contributor III

I have just sent you the executables.

0 Kudos
Reply
6,824 Views
chingling_wang
NXP Employee
NXP Employee

I further debug this using gdb, I got the back trace, it looks like sometimes, the surface is not ready(still null) when referencing it whille create the surface object, which results in seg fault.  I submitted a ticket for help from gpu driver engineer.

When segmentation fault,
It fails at gcoSURF_ReferenceSurface(Surface->renderTarget) in
gc_egl_surface.c

0603               Surface->renderTarget =
Surface->renderListCurr->surface;

0604              
gcmERR_BREAK(gcoSURF_ReferenceSurface(Surface->renderTarget));

For some reason, the renderListCurr->surface is still
null,  increasing the referece number
results in segmentation fault. 

What would you think can result in null surface, looks like a
timing issue, cannot gurantee surface is ready.

(gdb) bt

#0
gcoSURF_ReferenceSurface (Surface=0x0) at gc_hal_user_surface.c:12505

#1  0x75886cc8 in
_CreateSurfaceObjects (Thread=Thread@entry=0x4a794,

  
Surface=Surface@entry=0x13c8bc,

    ResolveFormat=ResolveFormat@entry=gcvSURF_R5G6B5)
at gc_egl_surface.c:604

#2  0x758877d0 in
veglResizeSurface (Surface=0x13c8bc,

  
Surface@entry=0x7efff5cc, Width=1024, Height=720,

  
ResolveFormat=gcvSURF_R5G6B5, BitsPerPixel=16) at gc_egl_surface.c:1389

#3  0x7588334c in
veglMakeCurrent (Dpy=Dpy@entry=0x126a5c,

  
Draw=Draw@entry=0x13c8bc, Read=Read@entry=0x13c8bc,
Ctx=Ctx@entry=0x1295e4)

    at
gc_egl_context.c:2508

#4  0x75883d44 in
eglMakeCurrent (Dpy=0x126a5c, Draw=Draw@entry=0x13c8bc,

    Read=Read@entry=0x13c8bc,
Ctx=0x1295e4) at gc_egl_context.c:2633

#5  0x757287cc in
QVgQpaContext::makeCurrent (this=0x129458,

  
surface=surface@entry=0x13c8bc) at qvgqpacontext.cpp:91

#6  0x75725b60 in
QVgQpaBackingStore::beginPaint (this=0x13df08, region=...)

    at
qvgqpabackingstore.cpp:2359

#7  0x76bb40dc in ?? ()
from /usr/lib/libQt5Widgets.so.5

0 Kudos
Reply
6,825 Views
chingling_wang
NXP Employee
NXP Employee

since you have so many elgMakeCurrent() in your .txt, it only make sense if you have different windows. Otherwise, on elgMakeCurrent is enough.

I run ./part3,  you sent before, I saw the segmentation fault , but I have question from the log on the screen,

first time eglMakeCurrent:

Has to call eglMakeCurrent******************************************

eglDisplay = 31386212, surface = 31420740, context = 31397372

second time eglMakeCurrent:

Expose region 0,0,1024,720

In qvgqpawindow.cpp - isAlertState

Has to call eglMakeCurrent******************************************

eglDisplay = 31386212, surface = 31475924, context = 31397372

Segmentation fault

why the context of second time is the same as the first time?

eglDisplay and surface can be the same, but context must be different it they belongs to different thread.

here even the surface is different, they are different windows, how can they have the same context?

After this, segmentation happens.

0 Kudos
Reply