We are using OpenVG with X11 on the imx6 and we encounter a problem that eglMakeCurrent will occasionally crash with segmentation fault. Not very sure on the cause. The input parameters seem to be valid at the time of the call. Below is the call stack when it dies.
==4346== Process terminating with default action of signal 11 (SIGSEGV)
==4346== Access not within mapped region at address 0x394
==4346== at 0x5E9DF64: gcoSURF_ReferenceSurface (gc_hal_user_surface.c:12505)
==4346== by 0x5F90CC7: _CreateSurfaceObjects (gc_egl_surface.c:604)
==4346== by 0x5F917CF: veglResizeSurface (gc_egl_surface.c:1389)
==4346== by 0x5F8D34B: veglMakeCurrent (gc_egl_context.c:2508)
==4346== by 0x5F8DD43: eglMakeCurrent (gc_egl_context.c:2633)
I have also attached the valgrind output that shows the steps that lead to this.
Original Attachment has been moved to: eglmakecurrent_bad.txt.zip
Solved! Go to Solution.
The moment you change something, then the problem may not show up. As we have mentioned, it is very unpredictable.
I will give that a try when I have a moment but it is very unlikely that the behavior will change without you guys explicitly putting in a fix for this.
By the way, there is really nothing magic about the part3 app. It is just one of the apps that shows the problem with that particular version of the driver and the qpa code. All I did when verifying this is to run all these simple Qt example apps and some of them will run into the seg fault. You can easily do the same on your end.
From gdb debug trace, I caught the place where the segmentation happens. When veglResizeSurface(), it destroyed and created surface object, for some timing reason, the Surface->renderTarget is still null when it shouldn't , increasing the reference count of null Surface->renderTarget results in segmentation fault.
I did change the code to try to make sure destroySurfaceObject finish before creatSurfaceOjbect, I am not sure if my change can fix the issue. I cannot reproduce it any more, That is why I let you try.
Can you give me the name of the simple Qt examples apps you used to run into the seg fault? I just need easier way to reproduce the issue.
Thanks
I was mainly using example code inside qtbase/examples/widgets/tutorials. I am assuming you already have Qt source somewhere. If not, you will have to download it first.
Can you package the executalbes for me so that we can simply run it to reproduce this issue since you already has the apps? Right now, when I rebuilt gpu driver, the seg fault is not happening anymore. If we canot reproduce it in a reliable way, how can we fix it?
Just tried your driver. It still seg fault. I saw it with the concentriccircles app.
I sort of know how to reproduce the issue, Just add or delete some debug info in gpu driver, it will happen finally.
I found something in the code that will lead null surface to be used in gpu code. After the change, I tried many times, so far, haven't seen the seg fault yet.
Can you build the gpu driver youself? if so, I can give you the patch.
Anyway, I attached the driver, you may help to try to see if you still see the issue
if you see seg fault again,
gdb -args your-application-name arg1, arg2....
when gdb start,
type after(gdb) run
when seg fault, type bt to get backtrace for me.
Thanks
Hi ChingLing,
Can you do us a favor? Since our actual code is based on the jethro branch, can you apply your patch on that version of the gpu driver instead and then provide us the patched driver?
Thanks.
what is jethro branch? my fix is based on our rel_imx_4.1.15_1.0.0_5.0.11.p8.3_ga tag, should be the same as you have? where do you get the so called jethro, from yocto community?
Jethro refers to the Yocto 2.0 release branch. The official Freescale i.MX6 kernel for this release is 3.14.52_1.1.0_ga and the details are found in the following link. We will required the driver to be built against this kernel version and we'll be using the X11 flavour. If the OpenVG driver is affected we need the 2D flavour of this driver. Thank you.
meta-fsl-arm/linux-imx_3.14.52.bb at jethro · Freescale/meta-fsl-arm · GitHub
Somehow the libGAL.so is not compatible. So what I am doing now is to replace all the .so except libGAL.so and it seems to be working.
it may meas that the kernel doesn't match. for the link you gave, gpu is updated 8.4, kernel needs to be rebuilt
It seems to me your current gpu is 8.3, when I built using that source before, you never have problem, Now, I updated to 8.4.
I believe it is using p7.4 according to the kernel-module-imx-gpu-viv_5.0.11.p7.4+fslc.bb.
We have tested this problem on two platforms. One on a custom board running 3.14.52_1.1.0_ga and p7.4. Then we try the same thing on a Sabre running 4.1.15-1.1.0_ga and p8.4.
So eventually, we want the fix to be on 3.14.52_1.1.0_ga and p7.4. But so far it seems to be okay running this 3.14.52_1.1.0_ga and p8.4+patch combo except the issue that I mentioned on libGAL.so.
This works. Thanks.
I have not seen the problem so far so this is likely a fix. Thanks for looking into this. Really appreciate it.
I have just sent you the executables.
I further debug this using gdb, I got the back trace, it looks like sometimes, the surface is not ready(still null) when referencing it whille create the surface object, which results in seg fault. I submitted a ticket for help from gpu driver engineer.
When segmentation fault,
It fails at gcoSURF_ReferenceSurface(Surface->renderTarget) in
gc_egl_surface.c
0603 Surface->renderTarget =
Surface->renderListCurr->surface;
0604
gcmERR_BREAK(gcoSURF_ReferenceSurface(Surface->renderTarget));
For some reason, the renderListCurr->surface is still
null, increasing the referece number
results in segmentation fault.
What would you think can result in null surface, looks like a
timing issue, cannot gurantee surface is ready.
(gdb) bt
#0
gcoSURF_ReferenceSurface (Surface=0x0) at gc_hal_user_surface.c:12505
#1 0x75886cc8 in
_CreateSurfaceObjects (Thread=Thread@entry=0x4a794,
Surface=Surface@entry=0x13c8bc,
ResolveFormat=ResolveFormat@entry=gcvSURF_R5G6B5)
at gc_egl_surface.c:604
#2 0x758877d0 in
veglResizeSurface (Surface=0x13c8bc,
Surface@entry=0x7efff5cc, Width=1024, Height=720,
ResolveFormat=gcvSURF_R5G6B5, BitsPerPixel=16) at gc_egl_surface.c:1389
#3 0x7588334c in
veglMakeCurrent (Dpy=Dpy@entry=0x126a5c,
Draw=Draw@entry=0x13c8bc, Read=Read@entry=0x13c8bc,
Ctx=Ctx@entry=0x1295e4)
at
gc_egl_context.c:2508
#4 0x75883d44 in
eglMakeCurrent (Dpy=0x126a5c, Draw=Draw@entry=0x13c8bc,
Read=Read@entry=0x13c8bc,
Ctx=0x1295e4) at gc_egl_context.c:2633
#5 0x757287cc in
QVgQpaContext::makeCurrent (this=0x129458,
surface=surface@entry=0x13c8bc) at qvgqpacontext.cpp:91
#6 0x75725b60 in
QVgQpaBackingStore::beginPaint (this=0x13df08, region=...)
at
qvgqpabackingstore.cpp:2359
#7 0x76bb40dc in ?? ()
from /usr/lib/libQt5Widgets.so.5
since you have so many elgMakeCurrent() in your .txt, it only make sense if you have different windows. Otherwise, on elgMakeCurrent is enough.
I run ./part3, you sent before, I saw the segmentation fault , but I have question from the log on the screen,
first time eglMakeCurrent:
Has to call eglMakeCurrent******************************************
eglDisplay = 31386212, surface = 31420740, context = 31397372
second time eglMakeCurrent:
Expose region 0,0,1024,720
In qvgqpawindow.cpp - isAlertState
Has to call eglMakeCurrent******************************************
eglDisplay = 31386212, surface = 31475924, context = 31397372
Segmentation fault
why the context of second time is the same as the first time?
eglDisplay and surface can be the same, but context must be different it they belongs to different thread.
here even the surface is different, they are different windows, how can they have the same context?
After this, segmentation happens.