IMX6ULL GPU initialization problem - pan flip timeout

matkattanek · ‎02-26-2018

Using an ILI9341 320x240 lcd display in 8bit RGB interface mode with an IMX6ULL. The display is initialized once over SPI at U-boot startup, where the NXP logo is successfully displayed. On kernel boot (4.9.11) all that needs to be done now is setting up the GPU (mxsfb.c) again. This again works nicely as the Penguin is displayed and shortly after I see the console prompt.

Here is my problem: Occasionally I see awhite LCD screen. Meaning the penguin is not showing and console prompt either. The initial U-boot though was successful, since the NXP logo got displayed. Only way out at this point is to reboot.

Looking at the driver initialization I noticed that one control register (lcdif_ctrl1n) is set differently in the 'failure' case:

Bit 8 and bit 9 show IRQ pending for VSYNC_EDGE_IRQ and CUR_FRAMEDONE_IRQ.

in this case some thing seems to go wrong. ( my guess is that the GPU hangs at this point) and the driver initialization

run into a problem and I see the follwoing msg twice:

mxsfb 21c8000.lcdif: mxs wait for pan flip timeout

and I end up with a 'white' non working display.

Another thing I came to notice was that the mxsfb.c::mxsfb_set_par() is run twice at initialization. This always happen in the 'good' and 'failure' case. In the 'failure' case though the second mxsfb_set_par() run shows the IRQ pending.

Is there a possible race condition? Why is mxsfb_set_par() run twice? is that needed?

Anybody experienced similar problem and able to share some more light on this?

Mat

Logesh · ‎10-02-2021

I have seen this issue even on official nxp u-boot(2019.04) based on the LCDIF clock divider values match with the lcd frequency.Found the issue only on odd divider values and no glitches on the even dividers.

I am sharing the details to fix the issue properly,it may help someone. By default, the lcdif root clock pre-multiplexer(LCDIF1_PRE_CLK_SEL) is configured to PLL3 PFD1.When we switch to PLL5,lcdif gets lockup sometimes due to clock glitches.This issue can be fixed by properly switching the LCDIF clock source. In u-boot source,make changes to disable the old and new clock source,configure the pre-divider,post divider settings and re-enable the old and new clock source.

matkattanek · ‎12-02-2019

Sorry I should have mentioned that earlier, but I totally forgot about this threat.

FYI, The initial prototype display we used did not expose the white screen screen problem at all.

The issue started when switching to the ILI9341 display. U-boot image shows perfectly but then white screen shows when the kernel starts. It seems the display itself crashed, since the RGB interface was still was clocking data out properly, but nothing shows on the screen. (this happened 5 of 100).

We eliminated the problem by changing the RGB IF timing, and not displaying an image with U-boot any longer, so the RGB interface clock is not starting, stopping (RGB clock stops when starting the kernel!) and re-starting again.

Ever since no more lcd hangup, no more white screens.

Hope that helps.

Mat

henrideveer · ‎07-23-2018

First: Can someone at NXP reopen this issue?

I found the rootcause of this "intermittent" issue. There looks to be a bug in the eLCDIF controller with respect to resetting it with the SFTRST bit. Register LCDIF_CTRLn (0x021c8000, reference manual paragraph 35.6.1) is the subject of the issue.

The reset sequence should be as follow:

1-Make sure the clk is not gated, so make sure bit 30 (CLKGATE ) is cleared.

2-Assert the SFTRST bit 31.

3-Now the controller starts it reset sequence and it will automatically set the clk gate bit.

4-Wait for the CLKGATE bit to be set.

Although is it poorly documented for the eLCDIF, you can find the relevant parts in all control blocks, they all work similar.

Now the interesting part: The CLKGATE bit is not always asserted! The controller is really in a defective state if this happens, trying to set the bits manually to recover the controller fail.

Is issue already pops up in U-Boot, when the controller fails Linux also fails, the controller is deadlocked.

The easiest way to show the sequence and to reproduce it is modifying the U-Boot sources.

The above reset sequence is implemented in the function "mxs_reset_block" and is located in "arch/arm/imx-common/misc.c"

This part times out:

    if (mxs_wait_mask_set(reg, MXS_BLOCK_CLKGATE, RESET_MAX_TIMEOUT)) {
        return 1;
    }

The returncode is never checked by the calling function so you won't notice it (apart that the splash screen does not work) until linux boots and shows nothing or garbage.

How to reproduce (In U-Boot):

In "drivers/video/msxfb.c" the function mxs_lcd_init() calls the mxs_reset_block() ->check the return code of the function, display some failing message and patch the bootcmd environment variable with something non-existing so you are sure linux does not boot.

    ret = mxs_reset_block(&regs->hw_lcdif_ctrl_reg);
    if (ret != 0) {
        printf("Video controller reset failed: %d!\n", ret);
        setenv("bootcmd",  "reset-failed");  // Just something that never boots.
        return;
    }

Now put the image on your sdcard (make sure you have some boot delay) and stop the boot in the console.

Set the bootcmd variable to "reset" and do a save of the environment:

setenv bootcmd reset

saveenv

Now reset the board so it starts looping until the failure is shown, this may take a few minutes but it can also be half hour.

Additional info: In my setup I use a splash screen of 1024x600 pixels with the following timings:

        {
    .bus = MX6UL_LCDIF1_BASE_ADDR,
    .addr = 0,
    .pixfmt = 24,
    .detect = NULL,
    .enable    = do_enable_parallel_lcd,
    .mode    = {
        .name        = "T700A04X00",
        .refresh    = 60,
        .xres           = 1024,
        .yres           = 600,
        .pixclock       = 20460,
        .left_margin    = 144,
        .right_margin   = 40,
        .upper_margin   = 18,
        .lower_margin   = 1,
        .hsync_len      = 104,
        .vsync_len      = 3,
        .sync           = 0,
        .vmode          = FB_VMODE_NONINTERLACED
                }
        }

I checked all relevant registers to see if anything is wrong during the reset such as:

PLL5 dividers, num/den, post dividers, clock gating etc. All are OK as far as I can see. So it really smells like a hardware bug in the chip.

P.S. There is also a bug in the "arch/arm/cpu/armv7/mx6/clock.c" "mxs_set_lcdclk()" PLL5 calculation, an integer overflow occurs in there that results in a wrong dotclock (it is off by a few 100'ts of kHz). I fixed that one but it is not the cause of the issue.

Until now I have no descent workaround for this issue.

The only workaround I can think of is just resetting the board with the watchdog when I detect the clock failure, and hopefully it will recover. But this is an extremely dirty one.

Anonymous · ‎11-28-2019

Hi, it sounds like we face the same problem here on an engicam microgea module powered with imx6ULL.

Investigating a lot around that, I started with pix_clock to drive clkout1, so that I can see what's happen. No surprise, when the lcdif does not work, there is no pix_clock as well, and it make even impossible to reset lcdif peripheral (because reset logic requires pix_clock).

Apparently, pix_clock stops time to time after changing LCDIF1_PODF divider (CBCMR register). This is what's happen when the mxsfb Linux driver tries to set pix_clock rate.

I reproduced the problem from u-boot using mw.l commands to program CCM registers, making sure that lcdif is not initialized and pix_clock root to lcdif gated. I can see pix_clock issue without involving lcdif.

I made a lot of tests with various conditions, differents PLL as source (LCDIF1_PRE_CLK_SEL) and divider combination. All PLL sources does not show the same stability, some (derived from PLL3) crashes pix_clock with any divider combination, while PLL2 crashes pix_clock only when the parity of the LCDIF1_PODF divider changes, even the muxing of clock for PERIPH looks to have an impact on stability ... not really easy to figure out how we can workaround this issue.

Even worse, all tries to recover from a broken clock have been unsuccessful.

Is there any guide to explain us the appropriate way to choose and manage the clock tree configuration ? The reference manual talks about clock gating options, but it looks like there is no register associated to control them.

Additionally, I tumbled on a Engineering Bulletin (EB821) from NXP, that depicts a similar clock issue on imx6DQ for LDB. The proposed solution to turn off PLL before changing divider values looks to be a nightmare to manage (turning PLL off might impact other devices). By the way, the clock tree configuration made by Linux shall be seriously reworked to handle such a workaround.

For our project, we decided to avoid odd LCDIF1_PODF divider values (it is /2 at boot), and tell the user space to reboot when "pan flip timeout" appears during the boot.

Hope it helps all of us to cope with that issue.

henrideveer · ‎07-24-2018

Here the fix I made in the clock calculation to avoid an integer overflow (for U-Boot) and fixes the wrong LCD clock.

Also made a lot of changes to make the code cleaner with respect to variable scoping, not necessarily required. The actual change needed is the "u64 fraction" calculation at the bottom.

--- a/arch/arm/cpu/armv7/mx6/clock.c
+++ b/arch/arm/cpu/armv7/mx6/clock.c
@@ -19,7 +19,7 @@ enum pll_clocks {
     PLL_USBOTG,    /* OTG USB PLL */
     PLL_ENET,    /* ENET PLL */
     PLL_AUDIO,    /* AUDIO PLL */
-    PLL_VIDEO,    /* AUDIO PLL */
+    PLL_VIDEO,    /* VIDEO PLL */
 };
 
 struct mxc_ccm_reg *imx_ccm = (struct mxc_ccm_reg *)CCM_BASE_ADDR;
@@ -624,47 +624,58 @@ static int enable_pll_video(u32 pll_div,
 }
 
 /*
- * 24M--> PLL_VIDEO -> LCDIFx_PRED -> LCDIFx_PODF -> LCD
+ * 24M--> PLL_VIDEO -> POST_DIV_SELECT-> VIDEO_DIV
+ *                  -> LCDIFx_PRED -> LCDIFx_PODF -> LCD
  *
  * 'freq' using KHz as unit, see driver/video/mxsfb.c.
  */
+#define MAX_PRED 8
+#define MAX_POSTD 8
+
 void mxs_set_lcdclk(u32 base_addr, u32 freq)
 {
-    u32 reg = 0;
+    u32 reg;
+    u32 temp, i, j;
+    u32 best, pred, postd;
+    u32 pll_div, pll_num, pll_denom, post_div;
+    u64 fraction;
     u32 hck = MXC_HCLK / 1000;
     /* DIV_SELECT ranges from 27 to 54 */
-    u32 min = hck * 27;
-    u32 max = hck * 54;
-    u32 temp, best = 0;
-    u32 i, j, max_pred = 8, max_postd = 8, pred = 1, postd = 1;
-    u32 pll_div, pll_num, pll_denom, post_div = 1;
+    u32 pll_min = hck * 27;
+    u32 pll_max = hck * 54;
 
     debug("mxs_set_lcdclk, freq = %dKHz\n", freq);
 
     if (!is_mx6sx() && !is_mx6ul() && !is_mx6ull() && !is_mx6sl() &&
         !is_mx6sll()) {
-        debug("This chip not support lcd!\n");
+        debug("This chip does not support the lcd interface!\n");
         return;
     }
 
     if (!is_mx6sl()) {
         if (base_addr == LCDIF1_BASE_ADDR) {
             reg = readl(&imx_ccm->cscdr2);
-            /* Can't change clocks when clock not from pre-mux */
-            if ((reg & MXC_CCM_CSCDR2_LCDIF1_CLK_SEL_MASK) != 0)
+            if ((reg & MXC_CCM_CSCDR2_LCDIF1_CLK_SEL_MASK) != 0) {
+                debug("Can't change clocks when clock not from pre-mux!\n");
                 return;
+            }
         }
     }
 
     if (is_mx6sx()) {
         reg = readl(&imx_ccm->cscdr2);
-        /* Can't change clocks when clock not from pre-mux */
-        if ((reg & MXC_CCM_CSCDR2_LCDIF2_CLK_SEL_MASK) != 0)
+        if ((reg & MXC_CCM_CSCDR2_LCDIF2_CLK_SEL_MASK) != 0) {
+            debug("Can't change clocks when clock not from pre-mux!\n");
             return;
+        }
     }
 
-    temp = freq * max_pred * max_postd;
-    if (temp < min) {
+    /* Find a PLL frequency that has at least the minimum
+        specified operating value.
+       If too low: use a post divider and choose a higher frequency. */
+    post_div = 1;
+    temp = freq * MAX_PRED * MAX_POSTD;
+    if (temp < pll_min) {
         /*
          * Register: PLL_VIDEO
          * Bit Field: POST_DIV_SELECT
@@ -675,23 +686,26 @@ void mxs_set_lcdclk(u32 base_addr, u32 f
          * No need to check post_div(1)
          */
         for (post_div = 2; post_div <= 4; post_div <<= 1) {
-            if ((temp * post_div) > min) {
+            if ((temp * post_div) > pll_min) {
                 freq *= post_div;
                 break;
             }
         }
 
         if (post_div > 4) {
-            printf("Fail to set rate to %dkhz", freq);
+            printf("Fail to set rate to %dkhz desired frequency is way too low", freq);
             return;
         }
     }
 
     /* Choose the best pred and postd to match freq for lcd */
-    for (i = 1; i <= max_pred; i++) {
-        for (j = 1; j <= max_postd; j++) {
+    best = 0;
+    pred = 1;
+    postd = 1;
+    for (i = 1; i <= MAX_PRED; i++) {
+        for (j = 1; j <= MAX_POSTD; j++) {
             temp = freq * i * j;
-            if (temp > max || temp < min)
+            if (temp > pll_max || temp < pll_min)
                 continue;
             if (best == 0 || temp < best) {
                 best = temp;
@@ -706,11 +720,13 @@ void mxs_set_lcdclk(u32 base_addr, u32 f
         return;
     }
 
-    debug("best %d, pred = %d, postd = %d\n", best, pred, postd);
+    debug("best = %d, pred = %d, postd = %d, post_div = %d\n", best, pred, postd, post_div);
 
     pll_div = best / hck;
     pll_denom = 1000000;
-    pll_num = (best - hck * pll_div) * pll_denom / hck;
+    /* Avoid integer overflow by using a 64 bit wide numbers */
+    fraction = (best - hck * pll_div);
+    pll_num = (u32)( (fraction * pll_denom) / hck);
 
     /*
      *                                  pll_num

244143298 · ‎08-24-2020

Hi，

May I ask what is the root cause of this problem？

henrideveer · ‎07-24-2018

I ran a test last night, with a reboot about every 30 seconds : The result is that in approximately 1.4% of the cases the controller is in the deadlock state.

The final patch I made to do a brute force recovery takes less then a second to recover and is here (for U-Boot):

--- a/drivers/video/mxsfb.c
+++ b/drivers/video/mxsfb.c
@@ -93,12 +93,27 @@ static void mxs_lcd_init(GraphicDevice *
     struct mxs_lcdif_regs *regs = (struct mxs_lcdif_regs *)(panel->isaBase);
     uint32_t word_len = 0, bus_width = 0;
     uint8_t valid_data = 0;
+    int ret;
 
     /* Kick in the LCDIF clock */
     mxs_set_lcdclk(panel->isaBase, PS2KHZ(mode->pixclock));
 
     /* Restart the LCDIF block */
-    mxs_reset_block(&regs->hw_lcdif_ctrl_reg);
+    ret = mxs_reset_block(&regs->hw_lcdif_ctrl_reg);
+    /* BRUTE FORCE BUGFIX:
+        Sometimes the lcd controller gets in a deadlock on startup.
+        The reset does not assert the CLKGATE any more and the
+         controller is dead. There is no way to get it out of this
+         situation except a hard reset.
+        So when the controller restart fails, just reset the board
+         and hope for the best.
+    */
+    if (ret != 0) {
+        puts("Video controller restart failed!\n");
+        do_reset(NULL, 0, 0, NULL);
+        /* Never returns here !!! */
+        return;
+    }
 
     switch (bpp) {
     case 24:
--- a/arch/arm/imx-common/misc.c
+++ b/arch/arm/imx-common/misc.c
@@ -42,19 +42,24 @@ int mxs_wait_mask_clr(struct mxs_registe
 
 int mxs_reset_block(struct mxs_register_32 *reg)
 {
+
+    /* Clear CLKGATE */
+    writel(MXS_BLOCK_CLKGATE, &reg->reg_clr);
+
+    /* Confirm the clock gate has been disabled. */
+    if (mxs_wait_mask_clr(reg, MXS_BLOCK_CLKGATE, RESET_MAX_TIMEOUT))
+        return 1;
+
     /* Clear SFTRST */
     writel(MXS_BLOCK_SFTRST, &reg->reg_clr);
 
     if (mxs_wait_mask_clr(reg, MXS_BLOCK_SFTRST, RESET_MAX_TIMEOUT))
         return 1;
 
-    /* Clear CLKGATE */
-    writel(MXS_BLOCK_CLKGATE, &reg->reg_clr);
-
     /* Set SFTRST */
     writel(MXS_BLOCK_SFTRST, &reg->reg_set);
 
-    /* Wait for CLKGATE being set */
+    /* Wait for CLKGATE being set, is done explicitly by the RESET sequencer */
     if (mxs_wait_mask_set(reg, MXS_BLOCK_CLKGATE, RESET_MAX_TIMEOUT))
         return 1;

igorpadykov · ‎02-27-2018

Hi Mat

in general if 'failure' happens though the second mxsfb_set_par,

one can check if previous operations finished, check description

of RUN bit LCDIF_CTRL register:

"This bit must remain set until the operation is complete."

Also one can check if this is caused by spi, bypassing its data.

May be useful to test with demo images

i.MX Software|NXP

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

henrideveer · ‎07-18-2018

Did you find out what the issue is?

I have a similar problem, a "mxs wait for pan flip timeout" and the controller is dead. This sometimes happens (only after a sw reboot) and is hard to reproduce.

Also the description of the RUN bit is unclear: Do you need to reset it yourself in the code, or is it automatically cleared if you reset the DOTCLK_MODE bit or whatever?

If you need to reset it yourself what should be the order? Clear RUN, then DOTCLK_MODE (like is done in the driver now) OR first clear the DOTCLK_MODE and then RUN?

When debugging the mxsfb.c driver code: the check loop that waits for CTRL_RUN always breaks in the first iteration. So it looks broken code to me (the bit is always zero). This code is coming from the yocto linux-imx port.

When looking at the mainline kernel module this code part is different en looks "better" (function mxsfb_disable_controller()):

Linux-imx fork:....
    writel(CTRL_RUN, host->base + LCDC_CTRL + REG_CLR);

    if (host->dispdrv && host->dispdrv->drv->disable)
        host->dispdrv->drv->disable(host->dispdrv, fb_info);

    /*
     * Even if we disable the controller here, it will still continue
     * until its FIFOs are running out of data
     */
    writel(CTRL_DOTCLK_MODE, host->base + LCDC_CTRL + REG_CLR);

    loop = 1000;
    while (loop) {
        reg = readl(host->base + LCDC_CTRL);
        if (!(reg & CTRL_RUN))
            break;
        loop--;
    }

    writel(CTRL_MASTER, host->base + LCDC_CTRL + REG_CLR);

    reg = readl(host->base + LCDC_VDCTRL4);
    writel(reg & ~VDCTRL4_SYNC_SIGNALS_ON, host->base + LCDC_VDCTRL4);
....

Mainline (4.18-rc1):

....
/*

* Even if we disable the controller here, it will still continue

* until its FIFOs are running out of data

      */
     writel(CTRL_DOTCLK_MODE, host->base + LCDC_CTRL + REG_CLR);
     loop = 1000;
     while (loop) {
          reg = readl(host->base + LCDC_CTRL);
          if (!(reg & CTRL_RUN))
               break;
          loop--;
     }

     reg = readl(host->base + LCDC_VDCTRL4);
     writel(reg & ~VDCTRL4_SYNC_SIGNALS_ON, host->base + LCDC_VDCTRL4);

     mxsfb_disable_axi_clk(host);
     clk_disable_unprepare(host->clk);
   if (host->clk_disp_axi)
           clk_disable_unprepare(host->clk_disp_axi);

The problem is that it is almost impossible to figure out which is "the best" code, there are numerous forks and branches in all kinds of repo's and fairly different concerning the contents of all drivers.

(Sorry about the formatting, something messes up the layout of this post)

Regards,

Henri

igorpadykov · ‎07-18-2018

please test with nxp official demo images

i.MX Software|NXP

Main line kernel is not supported by nxp, it may be posted on kernel mail list.

Best regards
igor

henrideveer · ‎07-18-2018

I used the initial kernel: Linux 4.9.11_1.0.0

So can somebody tell me, if the mainline is not supposed to be supported, why is there is a driver in there? (mxsfb.c) And who put it there?

More precise: The driver I got is 2468 lines long, straight from the yocto project.

The driver here (mxsfb.c\fbdev\video\drivers - linux-imx - i.MX Linux kernel ) is "only" 1020 lines long? Which is supposed to be the "official" supported version.

So can anybody shed a light on it which repo or branch is really leading?

(And preferably an exact link or copy of the code here).