i.MX8QXP throws a kernel panic on reboot powering off the mmc

arturobuzarra · ‎08-08-2023

Dear NXP,

We are using the i.MX8QXP processor on a custom board. We already have this platform working fine in a release based on your Linux kernel imx-5.4.70_2.3.0 and we never seen this error before. However after move this platform to the release based on lf-5.15.71-2.2.0 we start running in reboot issues.

i.MX8QXP platform throws the following unexpected kernel panic on reboot sequence:

[   24.478472] EXT4-fs (mmcblk0p3): re-mounted. Opts: (null). Quota mode: none.
[   24.491917] systemd-shutdown[1]: All filesystems unmounted.
[   24.497605] systemd-shutdown[1]: Deactivating swaps.
[   24.502790] systemd-shutdown[1]: All swaps deactivated.
[   24.508068] systemd-shutdown[1]: Detaching loop devices.
[   24.518080] systemd-shutdown[1]: All loop devices detached.
[   24.523705] systemd-shutdown[1]: Stopping MD devices.
[   24.529172] systemd-shutdown[1]: All MD devices stopped.
[   24.534522] systemd-shutdown[1]: Detaching DM devices.
[   24.540005] systemd-shutdown[1]: All DM devices detached.
[   24.545445] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[   24.663301] systemd-shutdown[1]: Syncing filesystems and block devices.
[   24.670193] systemd-shutdown[1]: Rebooting.
[   24.674413] kvm: exiting hardware virtualization
[   24.700220] ci_hdrc ci_hdrc.0: remove, state 4
[   24.704699] usb usb1: USB disconnect, device number 1
[   24.709769] usb 1-1: USB disconnect, device number 2
[   24.716052] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
[   29.920896] imx-scu scu: RPC send msg timeout
[   35.040856] imx-scu scu: RPC send msg timeout
[   35.045242] read temp sensor 355 failed, could be SS powered off, ret -110
[   35.808882] 1v8_adc_vref: disabling
[   40.160855] imx-scu scu: RPC send msg timeout
[   45.280858] imx-scu scu: RPC send msg timeout
[   45.285239]  sdhc0: failed to power off resource 248 ret -110
[   45.304882] Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP
[   45.312372] Modules linked in:
[   45.315432] CPU: 2 PID: 1 Comm: systemd-shutdow Not tainted 5.15.71-00026-gfc817b9c105a-dirty #146
[   45.331284] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   45.338251] pc : esdhc_readl_le+0x10/0x190
[   45.342359] lr : sdhci_send_command+0x500/0xe4c
[   45.346903] sp : ffff800009adb810
[   45.350220] x29: ffff800009adb810 x28: ffff000001515000 x27: 0000000000000020
[   45.357369] x26: 0000000000200001 x25: 0000000000000000 x24: ffff000001515810
[   45.364521] x23: 000000000000000b x22: ffff000000488000 x21: 0000000000000001
[   45.371669] x20: ffff800009adbb00 x19: ffff000001515580 x18: 0000000000000030
[   45.378819] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000000
[   45.385969] x14: 0000000000000371 x13: 0000000000000001 x12: 0000000000000001
[   45.393118] x11: 0000000000000000 x10: 00000000000009e0 x9 : ffff800009adb9a0
[   45.400268] x8 : ffff000000488a40 x7 : 0000000000000000 x6 : 0000000000000001
[   45.407418] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff000001515810
[   45.414568] x2 : ffff000001515580 x1 : 0000000000000024 x0 : ffff80000b420024
[   45.421721] Call trace:
[   45.424171]  esdhc_readl_le+0x10/0x190
[   45.427930]  sdhci_send_command_retry+0x40/0x130
[   45.432551]  sdhci_request+0x70/0xc4
[   45.436130]  __mmc_start_request+0x68/0x140
[   45.440326]  mmc_start_request+0x84/0xb0
[   45.444253]  mmc_wait_for_req+0x70/0x100
[   45.448180]  mmc_wait_for_cmd+0x68/0xa0
[   45.452020]  __mmc_switch+0x1f0/0x23c
[   45.455686]  mmc_switch+0x28/0x40
[   45.459005]  _mmc_flush_cache+0x54/0x80
[   45.462844]  _mmc_suspend+0x58/0x2ec
[   45.466424]  mmc_shutdown+0x30/0x60
[   45.469916]  mmc_bus_shutdown+0x40/0x80
[   45.473765]  device_shutdown+0x158/0x330
[   45.477700]  __do_sys_reboot+0x1f0/0x294
[   45.481638]  __arm64_sys_reboot+0x24/0x30
[   45.485649]  invoke_syscall+0x48/0x114
[   45.489402]  el0_svc_common.constprop.0+0x44/0xec
[   45.494121]  do_el0_svc+0x24/0x90
[   45.497438]  el0_svc+0x20/0x60
[   45.500498]  el0t_64_sync_handler+0xb0/0xb4
[   45.504683]  el0t_64_sync+0x1a0/0x1a4
[   45.508358] Code: aa0003e2 d503233f f9400c00 8b21c000 (b9400000) 
[   45.514466] ---[ end trace f85a9a543cdbcdcc ]---
[   45.519082] note: systemd-shutdow[1] exited with preempt_count 1
[   45.525101] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[   45.532772] Kernel Offset: disabled
[   45.536254] CPU features: 0x00000001,20000846
[   45.540617] Memory Limit: none
[   45.543670] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

After debug the issue, we found that disabling the SET_RUNTIME_PM_OPS functions (sdhci_esdhc_runtime_suspend, sdhci_esdhc_runtime_resume) on sdhci-esdhc-imx.c driver, we never seen this kernel panic anymore, but we obtain a different issue related with a timeout in the SCU:

[   28.830478] EXT4-fs (mmcblk0p3): re-mounted. Opts: (null). Quota mode: none.
[   28.843592] systemd-shutdown[1]: All filesystems unmounted.
[   28.849997] systemd-shutdown[1]: Deactivating swaps.
[   28.855255] systemd-shutdown[1]: All swaps deactivated.
[   28.861312] systemd-shutdown[1]: Detaching loop devices.
[   28.871291] systemd-shutdown[1]: All loop devices detached.
[   28.876919] systemd-shutdown[1]: Stopping MD devices.
[   28.882396] systemd-shutdown[1]: All MD devices stopped.
[   28.887735] systemd-shutdown[1]: Detaching DM devices.
[   28.893216] systemd-shutdown[1]: All DM devices detached.
[   28.898649] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[   28.914598] systemd-shutdown[1]: Syncing filesystems and block devices.
[   28.921514] systemd-shutdown[1]: Rebooting.
[   28.925778] kvm: exiting hardware virtualization
[   28.956844] ci_hdrc ci_hdrc.0: remove, state 4
[   28.961331] usb usb1: USB disconnect, device number 1
[   28.966399] usb 1-1: USB disconnect, device number 2
[   28.972711] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
[   34.273499] imx-scu scu: RPC send msg timeout
[   34.277906] imx8qxp-pinctrl scu:pinctrl: pin_config_set op failed for pin 9
[   34.284975] sdhci-esdhc-imx 5b010000.mmc: Error applying setting, reverse things back
[   35.809505] 1v8_adc_vref: disabling
[   39.393470] imx-scu scu: RPC send msg timeout
[   39.397849] read temp sensor 497 failed, could be SS powered off, ret -110
[   44.513472] imx-scu scu: RPC send msg timeout
[   49.633479] imx-scu scu: RPC send msg timeout
[   49.637869] imx8qxp-pinctrl scu:pinctrl: pin_config_set op failed for pin 9
[   49.644858] sdhci-esdhc-imx 5b010000.mmc: Error applying setting, reverse things back
[   49.652713] sdhci-esdhc-imx 5b010000.mmc: failed to activate pinctrl state default

After the several timeouts in the reboot process that delays it, the device reboots but by the watchdog.

We tested the SCU firmware v1.11.0 and v.1.15.0, but we obtain the same issue.

Q1: Could you help us to fix this issue?

According with our findings in the sdhci-esdhc-imx.c driver that points to the RUNTIME_PM functions...

Q2: Is there a race condition in shutdown process with the mmc clocks managed by sdhci-esdhc-imx.c driver and RUNTIME_PM functions?

Notice that we didn't initialize the M4 core so we don't know why the SCU throws the error: "imx8qxp-pinctrl scu:pinctrl: pin_config_set op failed for pin 9"

Q3: Do you know why the SCU fails setting that pinctrl?

Thanks in advance,

Arturo

brian14 · ‎08-11-2023

Hi @arturobuzarra,

Thank you for contacting NXP Support.

Usually, this error is related to an incorrect control of the pin.

Here I will attach a reference that could help you with this problem:

Solved: [iMX8QM-MEK] pin_config_set op faild for pin 9 - NXP Community

Unfortunately, it is difficult to help you with the debugging of your custom board, but I will try to do my best to help with any question.

arturobuzarra · ‎08-14-2023

Hi Brian,

Thanks for your reply, but unfortunately we are not using this pin for anything in the SCU.

This is our pinctrl configuration for usdhc1:

	/* eMMC */
	pinctrl_usdhc1: usdhc1grp {
		fsl,pins = <
			IMX8QXP_EMMC0_CLK_CONN_EMMC0_CLK		0x06000041
			IMX8QXP_EMMC0_CMD_CONN_EMMC0_CMD		0x00000021
			IMX8QXP_EMMC0_DATA0_CONN_EMMC0_DATA0	0x00000021
			IMX8QXP_EMMC0_DATA1_CONN_EMMC0_DATA1	0x00000021
			IMX8QXP_EMMC0_DATA2_CONN_EMMC0_DATA2	0x00000021
			IMX8QXP_EMMC0_DATA3_CONN_EMMC0_DATA3	0x00000021
			IMX8QXP_EMMC0_DATA4_CONN_EMMC0_DATA4	0x00000021
			IMX8QXP_EMMC0_DATA5_CONN_EMMC0_DATA5	0x00000021
			IMX8QXP_EMMC0_DATA6_CONN_EMMC0_DATA6	0x00000021
			IMX8QXP_EMMC0_DATA7_CONN_EMMC0_DATA7	0x00000021
			IMX8QXP_EMMC0_STROBE_CONN_EMMC0_STROBE	0x00000041
		>;
	};

If I change the order of the pins I get the same error but with a different pin ID so it's not related to a specific pin ( See src/scfw_export_mx8qx/platform/config/mx8qx/pads.h

#define SC_P_EMMC0_CLK                           9U    /*!< CONN.EMMC0.CLK, CONN.NAND.READY_B, LSIO.GPIO4.IO07 */
#define SC_P_EMMC0_CMD                           10U   /*!< CONN.EMMC0.CMD, CONN.NAND.DQS, LSIO.GPIO4.IO08 */
#define SC_P_EMMC0_DATA0                         11U   /*!< CONN.EMMC0.DATA0, CONN.NAND.DATA00, LSIO.GPIO4.IO09 */
#define SC_P_EMMC0_DATA1                         12U   /*!< CONN.EMMC0.DATA1, CONN.NAND.DATA01, LSIO.GPIO4.IO10 */
#define SC_P_EMMC0_DATA2                         13U   /*!< CONN.EMMC0.DATA2, CONN.NAND.DATA02, LSIO.GPIO4.IO11 */
#define SC_P_EMMC0_DATA3                         14U   /*!< CONN.EMMC0.DATA3, CONN.NAND.DATA03, LSIO.GPIO4.IO12 */
#define SC_P_COMP_CTL_GPIO_1V8_3V3_SD1FIX0       15U   /*!<  */
#define SC_P_EMMC0_DATA4                         16U   /*!< CONN.EMMC0.DATA4, CONN.NAND.DATA04, CONN.EMMC0.WP, LSIO.GPIO4.IO13 */
#define SC_P_EMMC0_DATA5                         17U   /*!< CONN.EMMC0.DATA5, CONN.NAND.DATA05, CONN.EMMC0.VSELECT, LSIO.GPIO4.IO14 */
#define SC_P_EMMC0_DATA6                         18U   /*!< CONN.EMMC0.DATA6, CONN.NAND.DATA06, CONN.MLB.CLK, LSIO.GPIO4.IO15 */
#define SC_P_EMMC0_DATA7                         19U   /*!< CONN.EMMC0.DATA7, CONN.NAND.DATA07, CONN.MLB.SIG, LSIO.GPIO4.IO16 */
#define SC_P_EMMC0_STROBE                        20U   /*!< CONN.EMMC0.STROBE, CONN.NAND.CLE, CONN.MLB.DATA, LSIO.GPIO4.IO17 */
#define SC_P_EMMC0_RESET_B                       21U   /*!< CONN.EMMC0.RESET_B, CONN.NAND.WP_B, LSIO.GPIO4.IO18 */

In SCU FW we have the following board definition, where we don't reserve any pads for M4:

/*--------------------------------------------------------------------------*/
/* Configure the system (inc. additional resource partitions)               */
/*--------------------------------------------------------------------------*/
void board_system_config(sc_bool_t early, sc_rm_pt_t pt_boot)
{
    sc_err_t err = SC_ERR_NONE;

    /* This function configures the system. It usually partitions
       resources according to the system design. It must be modified by
       customers. Partitions should then be specified using the mkimage
       -p option. */

    /* Note the configuration here is for NXP test purposes */

    sc_bool_t alt_config = SC_FALSE;
    sc_bool_t no_ap = SC_FALSE;
    sc_bool_t ddrtest = SC_FALSE;

    /* Get boot parameters. See the Boot Flags section for definition
       of these flags.*/
    (void) boot_get_data(NULL, NULL, NULL, NULL, NULL, NULL, &alt_config,
        NULL, &ddrtest, &no_ap, NULL);

    board_print(3, "board_system_config(%d, %d)\n", early, alt_config);

    #if !defined(EMUL)
        if (ddrtest == SC_FALSE)
        {
            uint64_t ram = hwid_get_ramsize(cc8x_ramid);
            if (ram == 0) {
                /* if RAM size was not coded, use variant to obtain RAM size */
                if (cc8x_variant < ARRAY_SIZE(ccimx8x_variants_ram))
                        ram = ccimx8x_variants_ram[cc8x_variant];
            }

            sc_rm_mr_t mr_temp;

            if (ram < SC_2GB) {
                /* Board has less than 2GB so fragment lower region and delete */
                BRD_ERR(rm_memreg_frag(pt_boot, &mr_temp, DDR_BASE0 + ram,
                    DDR_BASE0_END));
                BRD_ERR(rm_memreg_free(pt_boot, mr_temp));
            }
            if (ram <= SC_2GB) {
                /* Board has 2GB memory or less so delete upper memory region */
                BRD_ERR(rm_find_memreg(pt_boot, &mr_temp, DDR_BASE1, DDR_BASE1));
                BRD_ERR(rm_memreg_free(pt_boot, mr_temp));
            }
            else {
                /* Fragment upper region and delete */
                BRD_ERR(rm_memreg_frag(pt_boot, &mr_temp, DDR_BASE1 + ram
                    - SC_2GB, DDR_BASE1_END));
                BRD_ERR(rm_memreg_free(pt_boot, mr_temp));
            }
        }
    #endif

    /* Name default partitions */
    PARTITION_NAME(SC_PT, "SCU");
    PARTITION_NAME(SECO_PT, "SECO");
    PARTITION_NAME(pt_boot, "BOOT");


}

So I don't have these pins defined on any node other than usdhc1.

Could you provide us any guidance?

Thanks,

Arturo.

i.MX8QXP throws a kernel panic on reboot powering off the mmc

i.MX8QXP throws a kernel panic on reboot powering off the mmc

i.MX 8 Family | i.MX 8QuadMax (8QM) | 8QuadPlus

Linux

Suspected Software Defect