i.MX 8QuadPlus Suspend/Resume Issue with NXP Linux 5.4.70_2.3.7 Patch

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

i.MX 8QuadPlus Suspend/Resume Issue with NXP Linux 5.4.70_2.3.7 Patch

1,797 Views
marcelziswiler
Senior Contributor I

We noticed suspend/resume to be broken on i.MX 8QuadPlus using SCFW from Linux 5.4.70_2.3.7 Patch. It used to work fine on the i.MX 8QuadPlus using the previous SCFW and it still works fine on the i.MX 8QuadMax. Therefore my question: Did NXP ever validate any of this on the i.MX 8QuadPlus? What exactly could be the issue? Thanks!

While resuming Apalis iMX8 QuadPlus the process gets stuck and does not proceed:

root@apalis-imx8-06602842:~# echo +10 > /sys/class/rtc/rtc1/wakealarm && echo enabled > /sys/class/tty/ttyLP1/power/wakeup && echo mem > /sys/power/state
[   13.453866] PM: suspend entry (deep)
[   13.462965] Filesystems sync: 0.004 seconds
[   13.775599] Freezing user space processes ... (elapsed 0.002 seconds) done.
[   13.784960] OOM killer disabled.
[   13.788255] Freezing remaining freezable tasks ... (elapsed 0.074 seconds) done.
[   13.873266] mwifiex_pcie 0000:01:00.0: None of the WOWLAN triggers enabled
[   13.886494] pcieport 0001:02:01.0: pciehp: Timeout on hotplug command 0x1038 (issued 9628 msec ago)
[   15.914487] pcieport 0001:02:01.0: pciehp: Timeout on hotplug command 0x0008 (issued 2020 msec ago)
[   16.731332] fec 5b040000.ethernet eth0: Link is Down
[   16.738094] usb3503 3-0008: switched to STANDBY mode
[   16.919295] PM: suspend devices took 3.052 seconds
[   16.975181] Disabling non-boot CPUs ...
[   16.979980] CPU1: shutdown
[   16.982795] psci: CPU1 killed (polled 0 ms)
[   16.990797] CPU2: shutdown
[   16.993600] psci: CPU2 killed (polled 0 ms)
[   17.001593] CPU3: shutdown
[   17.004412] psci: CPU3 killed (polled 0 ms)
[   17.011968] CPU4: shutdown
[   17.014796] psci: CPU4 killed (polled 0 ms)
[   17.022091] Enabling non-boot CPUs ...
[   17.026662] Detected VIPT I-cache on CPU1
[   17.026692] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
[   17.026740] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[   17.027716] CPU1 is up
[   17.048469] Detected VIPT I-cache on CPU2
[   17.048484] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
[   17.048506] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
[   17.048970] CPU2 is up
[   17.069645] Detected VIPT I-cache on CPU3

 

0 Kudos
11 Replies

1,631 Views
jimmychan
NXP TechSupport
NXP TechSupport

Reply from the expert team:

=======================

 I tried on a QP part (socketed MEK) and it works fine for me.

Some questions for the customer reporting the issue:

  1. Can they please list all the different components (linux, scfw, uboot, atf etc) ?
  2. Is the failure random or fails every time?
  3. Which earlier version of SCFW was working?
  4. Any board changes between the two tests?

=======================

0 Kudos

1,610 Views
marcelziswiler
Senior Contributor I

> I tried on a QP part (socketed MEK) and it works fine for me.

And what exact versions of things did you use for that test?

> 1. Can they please list all the different components (linux, scfw, uboot, atf etc)?

As mentioned before it is all based on downstream NXP Linux BSP 5.4.70_2.3.0 at the level of ​Linux 5.4.70_2.3.7 Patch:

Linux https://git.toradex.com/cgit/linux-toradex.git/log/?h=toradex_5.4-2.3.x-imx

SCFW https://github.com/toradex/i.MX-System-Controller-Firmware

U-Boot https://git.toradex.com/cgit/u-boot-toradex.git/log/?h=toradex_imx_v2020.04_5.4.70_2.3.0

ATF https://git.toradex.com/cgit/imx-atf.git/log/?h=toradex_imx_5.4.70_2.3.0

Anyway, basically OpenEmbedde/Yocto Project repo manifest from here:

https://git.toradex.com/cgit/toradex-manifest.git/log/?h=dunfell-5.x.y

> 2. Is the failure random or fails every time?

Fails every time.

> 3. Which earlier version of SCFW was working?

Linux 5.4.70_2.3.5 Patch (as we skipped the later Linux 5.4.70_2.3.6 Patch)

> 4. Any board changes between the two tests?

No, not really. And as mentioned before. It works just fine on the QuadMax, just not on the QuadPlus

Thanks!

0 Kudos

1,547 Views
ranjani_vaidyan
NXP Employee
NXP Employee

As I cannot repro the issue on our side, we need more debug to be done on the board. 

I would suggest the following to narrow down the issue:

1. Replace one at a time (uboot/atf/linux/scfw) from the 2.3.5 to 2.3.7. This may help identify which module is causing the issue.

2. Build SCFW with debug monitor (M=1) and enable SCFW uart. Type power.r at the point of failure to see the status of A72. It should be up since the voltage rail is at 1.1V.  Also would like to know if there is any error reported on the SCFW console. 

3. Connect with a JTAG debugger to see where A72 is hung.

4. Offline all A53 cores before suspend and see if the issue still persists. 

 

0 Kudos

1,428 Views
marcelziswiler
Senior Contributor I

> 1. Replace one at a time (uboot/atf/linux/scfw) from the 2.3.5 to 2.3.7. This may help identify which module is causing the issue.

Turns out it is indeed not the boot container (U-Boot/ATF/SCFW) but rather Linux itself! I also tried with the exact i.MX 8QuadMax device tree (instead of the QuadPlus one) but that did not help.

> 2. Build SCFW with debug monitor (M=1) and enable SCFW uart. Type power.r at the point of failure to see the status of A72. It should be up since the voltage rail is at 1.1V. Also would like to know if there is any error reported on the SCFW console.

Please find it attached:

apalis-imx8qp_scfw_bsp-5.6_booted-suspended-resumed.log: This is based on 2.3.5
apalis-imx8qp_scfw_bsp-5.7_booted-suspended-resume_failed.log: This is based on 2.3.7

> 3. Connect with a JTAG debugger to see where A72 is hung.

Unfortunately, I currently do not have any such environment available.

> 4. Offline all A53 cores before suspend and see if the issue still persists.

Indeed, that helps! What exactly does that mean now?

root@apalis-imx8-06677517:~# echo 0 > /sys/devices/system/cpu/cpu0/online
[ 415.415305] CPU0: shutdown
[ 415.418077] psci: CPU0 killed (polled 0 ms)
root@apalis-imx8-06677517:~# echo 0 > /sys/devices/system/cpu/cpu1/online
[ 419.167260] CPU1: shutdown
[ 419.169990] psci: CPU1 killed (polled 0 ms)
root@apalis-imx8-06677517:~# echo 0 > /sys/devices/system/cpu/cpu2/online
[ 422.375296] CPU2: shutdown
[ 422.378021] psci: CPU2 killed (polled 0 ms)
root@apalis-imx8-06677517:~# echo 0 > /sys/devices/system/cpu/cpu3/online
[ 425.428610] CPU3: shutdown
[ 425.431405] psci: CPU3 killed (polled 0 ms)

[ 427.298482] read temp sensor 0 failed, could be SS NOT powered up, return 0 for this thermal zone, ret -1

root@apalis-imx8-06677517:~# echo +10 > /sys/class/rtc/rtc1/wakealarm; echo mem > /sys/power/state
[ 440.696760] PM: suspend entry (deep)
[ 440.706600] Filesystems sync: 0.006 seconds
[ 440.808392] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 440.817354] OOM killer disabled.
[ 440.820613] Freezing remaining freezable tasks ... (elapsed 0.102 seconds) done.
[ 440.930796] printk: Suspending console(s) (use no_console_suspend to debug)
[ 440.950219] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 440.951417] sd 0:0:0:0: [sda] Stopping disk
[ 440.994201] pcieport 0000:02:01.0: pciehp: Timeout on hotplug command 0x1038 (issued 436772 msec ago)
[ 443.014216] pcieport 0000:02:01.0: pciehp: Timeout on hotplug command 0x0008 (issued 2020 msec ago)
[ 443.807018] fec 5b040000.ethernet eth0: Link is Down
[ 443.808691] usb3503 3-0008: switched to STANDBY mode
[ 443.933242] PM: suspend devices took 2.996 seconds
[ 443.978926] Disabling non-boot CPUs ...
[ 444.120230] imx6q-pcie 5f000000.pcie: PCIe PLL locked after 0 us.
[ 444.434230] imx6q-pcie 5f000000.pcie: Link up
[ 444.434234] imx6q-pcie 5f000000.pcie: Link: Gen2 disabled
[ 444.434238] imx6q-pcie 5f000000.pcie: Link up, Gen1
[ 444.488704] [drm] Started firmware!
[ 444.489817] [drm] HDP FW Version - ver 34219 verlib 20560
[ 444.489825] [drm] Pixel clock: 0 KHz, character clock: 0, bpc is 0-bit.
[ 444.489829] [drm] Pixel clk (0 KHz) not supported, color depth (0-bit)
[ 444.489845] [drm:cdns_hdmi_phy_set_imx8qm [cdns_mhdp_imx]] *ERROR* failed to set phy pclock
[ 444.491567] caam 31400000.crypto: registering rng-caam
[ 444.503369] usb3503 3-0008: switched to HUB mode
[ 444.768496] usb usb4: root hub lost power or was reset
[ 444.768501] usb usb5: root hub lost power or was reset
[ 444.962231] usb 3-1: reset high-speed USB device number 2 using ci_hdrc
[ 445.309767] configfs-gadget gadget: high-speed config #1: c
[ 445.434227] usb 3-1.2: reset high-speed USB device number 3 using ci_hdrc
[ 445.858229] usb 3-1.2.3: reset full-speed USB device number 4 using ci_hdrc
[ 445.981972] ahci-imx 5f020000.sata: external osc is used.
[ 445.984857] sd 0:0:0:0: [sda] Starting disk
[ 446.106221] pcieport 0000:02:01.0: pciehp: Timeout on hotplug command 0x0008 (issued 1672 msec ago)
[ 446.458220] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 446.480726] ata1.00: configured for UDMA/133
[ 447.304393] fec 5b040000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 448.126236] pcieport 0000:02:01.0: pciehp: Timeout on hotplug command 0x1028 (issued 2020 msec ago)
[ 448.127621] PM: resume devices took 3.640 seconds
[ 448.322846] OOM killer enabled.
[ 448.325984] Restarting tasks ... done.
[ 448.386447] PM: suspend exit
root@apalis-imx8-06677517:~#

 

0 Kudos

1,412 Views
jimmychan
NXP TechSupport
NXP TechSupport

Hello,

 

I got the reply from the expert

==========================

As customer mentioned, "Turns out it is indeed not the boot container (U-Boot/ATF/SCFW) but rather Linux itself!", therefore now the question is that what is changed in the Linux kernel code between 2.3.5 and 2.3.7. Since customer is using their own kernel repository https://git.toradex.com/cgit/linux-toradex.git/tree/?h=toradex_5.4-2.3.x-imx, they need to check that. 

From our linux kernel repository https://source.codeaurora.org/external/imx/linux-imx/refs/tags, I checked the commit between tag rel_imx_5.4.70_2.3.2 and rel_imx_5.4.70_2.3.7, and found only the following commit is related to 8QM. 

https://source.codeaurora.org/external/imx/linux-imx/commit/?h=rel_imx_5.4.70_2.3.7&id=ab634b63cfa96...

If customer didn't found anything suspicious in their own changes, they can try to revert this commit to check whether it's related.

===========================

 

Best regards,

Jimmy

0 Kudos

1,570 Views
ranjani_vaidyan
NXP Employee
NXP Employee

And what exact versions of things did you use for that test?

RV - Used the version that was released but from NXP internal build.

A few more questions:

1. The board design between QuadMax and QuadPlus is the same?

2. Does every QP board fail? 

3. Can you measure the PMIC voltage for A72 at the point of failure? I am not sure if you are using a PMIC or not.

 

0 Kudos

1,565 Views
marcelziswiler
Senior Contributor I

> 1. The board design between QuadMax and QuadPlus is the same?

Yes, exactly the same.

> 2. Does every QP board fail?

Yes.

> 3. Can you measure the PMIC voltage for A72 at the point of failure?

I looked both at the VDD_A53 as well as VDD_A72 both on the QuadMax as well as the QuadPlus.

QM

before: both 1V

echo +30 > /sys/class/rtc/rtc1/wakealarm; echo mem > /sys/power/state

during: both off

after: both 1V

=> resumes just fine

QP

before: both 1V

echo +30 > /sys/class/rtc/rtc1/wakealarm; echo mem > /sys/power/state

during: both off

after: both 1.1V

=> just hangs

> I am not sure if you are using a PMIC or not.

Yes, as for the PMICs we do have a dual PF8100 design.

Latest logfiles attached.

0 Kudos

1,568 Views
ranjani_vaidyan
NXP Employee
NXP Employee

Just to clarify, tested on the 5.4.70_2.3.7 release. 

1,725 Views
marcelziswiler
Senior Contributor I

Any statement from NXP? Thanks!

0 Kudos

1,714 Views
fmonte
Contributor IV

Does it work on the MEK board?

(I don't think NXP validates the BSP on Apalis)

0 Kudos

1,712 Views
marcelziswiler
Senior Contributor I

Well, I am not aware of any MEK board existing for the i.MX 8QuadPlus.

0 Kudos