[iMX6Q] CPU bus hang while using PCIex & 2D GPU

grzegorzprajsne · ‎06-25-2014

Hello,

We are developing a custom board using iMX6Q CPU, however we run into some issues with GPU support in X11 (at least everything points into that direction in our opinion).

Our setup is:

iMX6Q (industrial and automotive, issue occurs on both versions, 800 MHz)
1 or 2GB of RAM (issue occurs on both versions)
Renesas uPD720202 PCIex USB3.0 host controller
12,1” 1280x800 LCD display connected to iMX via LVDS
Debian 7.0 (armhf)
Kernel 3.10.17_GA
U-Boot 2013.07
GPU libraries 4.6.9p13 (from GA release of 3.10.17)

Problem description:

We see a CPU hang within few minutes when doing the following things at the same time:

running Firefox (compiled) or Iceweasel (from debian package) in fullscreen mode, displaying webpage created by our customer (the page is nothing special, mostly static, just few fields refreshing every second or so);
doing some kind of USB3.0 communication via PCIex and uPD720202 (like dd if=/dev/zero of=/dev/sda; sda being the USB3.0 stick in that port);
Xorg using EXA/DRI driver (Driver "vivante" in xorg.conf);

After the hang JTAG is not available, we do not see kernel dump or any kind of logging, clocks are still emitted. Additionally what we can see on display is that it slowly fades away and displays strange random pattern. This should point to some kind of AXI bus hang, right? When doing both things separately (PCIex usage and displaying the webpage) the issue does not occur.

Our findings so far:

board will not crash when Xorg uses fb driver instead of vivante EXA driver;
we can reproduce the issue on SabreSD (quad variant) and mini PCIe to USB3.0 converter containing uPD720202 when running our own software;
we can reproduce the issue on SabreSD (quad variant) and mini PCIe to USB3.0 converter containing uPD720202 when running FSL Kernel, U-Boot and our own filesystem;
we can reproduce the issue on SabreSD (quad variant) and mini PCIe to USB3.0 converter containing uPD720202 when running FSL Kernel, U-Boot, latest official release yocto image with just some of our stuff added on top of it (webrowser and page itself);
we tried reapplying ENGR00302036-3 gpu:gpu2d may cause bus hang in some corner case as it was reverted with one of the last commits before 3.10.17 was released, it does not fix the issue;
we observed the problem only on this one web page, we tried to reproduce it on different sites and with different tools (e.g. glxgears) but without any luck, our assumption is that it is related to the way that particular page is displayed;
we observed that when problematic webpage is displayed interrupt 42 is used quite a lot, which we assume is OK as it is assigned to 2D GPU;
we tried changing memory size assigned to the GPU, it does not solve this issue;

Summary:

Based on those facts we concluded that displaying said website (using FF based webrowser) creates some specific series of events in EXA driver which could cause buggy behavior. We do realize it might not be 100% accurate, but this is what we concluded so far. We know that this could be caused by some bug in our software, but we would not expect that userspace tooling can crash the device in such way. Reproducing it on SabreSD should also point that this is not really a HW issue. We are currently busy of creating a webpage similar to the one on which we see the issue and we will post it here once it's ready, so other people might reproduce it as well (we cannot share parts of our customer SW here).

Maybe Freescale has seen such behavior previously, could point us into some direction where to look for or suggest some debugging actions we should take?

Thanks in advance.

PrabhuSundarara · ‎06-25-2014

Is it possible to reproduce without PCIe?

Can you able to provide the rootfs to debug the issue. May the following environment willl help

we can reproduce the issue on SabreSD (quad variant) and mini PCIe to USB3.0 converter containing uPD720202 when running FSL Kernel, U-Boot, latest official release yocto image with just some of our stuff added on top of it (webrowser and page itself);

grzegorzprajsne · ‎06-25-2014

Hi Prabhu,

Thanks for looking at this issue. It is kind of possible to reproduce without PCIex usage - we saw similar crash on some devices which were running for 48+ hours (3 out of 56 devices crashed in same way), therefore we assume PCIex usage greatly decreases the time needed to reproduce the issue. I am not 100% sure if I can provide FSL with our software at the moment - I need to check with my colleagues, as I wrote in my previous post we are working at the moment on reproducing the issue with some more generic website which does not contain software of our customer.

zhenyong_chen · ‎06-26-2014

Hi,

Let's check whether 2D cause this hang:

Edit /etc/X11/xorg.conf

In Section "Device"

Add

Option "NoAccel" "true"

If it can avoid hw failure, then it will be easy to identify which 2D api causes such issue.

grzegorzprajsne · ‎06-27-2014

Hi Zhenyong,

Thanks for looking at our issue. Enabling NoAccel seems to make the issue do not occur (tested it for an hour only so far, but it is still much more uptime than we see without it). Additionally we saw that issue seems to not occur when we have X set to 16-bit mode instead of 24-bit mode. Maybe this can point to something as well.

[iMX6Q] CPU bus hang while using PCIex & 2D GPU

[iMX6Q] CPU bus hang while using PCIex & 2D GPU

Graphics & Display

i.MX6Quad

Linux

Suspected Software Defect