How to debug dysfunctional SERDES (PCIe, SATA, SGMII) on LS1028A?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to debug dysfunctional SERDES (PCIe, SATA, SGMII) on LS1028A?

3,949 Views
mntmn
Contributor II

Hi,

we are developing an open LS1028A SOM as a joint venture with a partner company (they are doing the hardware engineering). The SOM successfully boots u-boot from SD Card and Linux on eMMC, with 8GB of DDR4. I have done many tests and the CPU cores, memory, eMMC and USB3 appear stable.

We have however massive trouble getting any of the SERDES functions to work.

The lane setup is B8BE, PLL setup 1112:

SRDS_PRTCL_S1_L0=11 /* PCIe */
SRDS_PRTCL_S1_L1=8 /* SGMII */
SRDS_PRTCL_S1_L2=11 /* PCIe */
SRDS_PRTCL_S1_L3=14 /* SATA */

The refclk generator for SERDES PLL1+PLL2 is SI52146-A01AGM, with 100MHz diff clocks. The same chip also outputs refclks for both PCIe lanes (separate).

The problems:

1. SGMII Ethernet PHY (88E1512-XX-NNP2I000) is alive and talking fine over MDIO. It can even report 1GBit linkup via ethtool. But there are never any packets received or sent, so it appears that the SGMII communication does not work at all.

2. PCIe reports "layerscape-pcie 3400000.pcie: Phy link never came up". Tested with several M.2 cards. The same carrier board with i.MX8MQ module works fine with any PCIe card. I tried desoldering 49.9 Ohms termination resistors on the clock lines of the carrier, but this didn't change anything. My only remaining doubt is the reset line for PCIe. It is connected to GPIO 28 and I can toggle it in uboot with `gpio clear/set MPC@0232000028` and reset is logic high (not asserted) by default. I'm not sure how to correctly integrate the reset line with the qoriq pcie controller node in dts. On i.MX8MQ, there is a reset-gpio attribute in DTS.

3. SATA (qoriq-ahci) does not link. I've tested with 3 different SSDs. Sometimes, I get "ata1: SATA link down (SStatus 0 SControl 300))" every 2 seconds. If I live unplug the SSD, the messages go away, so there's some kind of link detection happening at least. We later found that the SATA RX P/N pair was swapped and fixed it on the board, but this did not make any difference.

Here is a register dump of relevant SERDES registers with cards plugged in:

=> md 0x1ea0000
01ea0000: 426745e8 00800008 08004100 008c0000 .EgB.....A......
01ea0010: 00000000 e8000000 00000000 00000000 ................
01ea0020: 426745e8 008a0008 08004100 01ad0000 .EgB.....A......
01ea0030: 00000000 e8000000 00000000 00000000 ................
01ea0040: 00000000 00000000 00000000 00000000 ................
01ea0050: 00000000 00000000 00000000 00000000 ................
01ea0060: 00000000 00000000 00000000 00000000 ................
01ea0070: 00000000 00000000 00000000 00000000 ................
01ea0080: 00000000 00000000 00000000 00000000 ................
01ea0090: 08000000 00000000 00000000 00000000 ................
01ea00a0: 48008000 00000000 00000000 00000000 ...H............
01ea00b0: 02804000 00000000 00000000 00000000 .@..............
01ea00c0: 00000000 00000000 00000000 00000000 ................
01ea00d0: 99000000 00000000 00000000 00000000 ................
01ea00e0: 00000000 00000000 00000000 00000000 ................
01ea00f0: 00000000 00000000 00000000 00000000 ...............

Full schematics for the SOM and the carrier board are attached. These are open source and can be freely reproduced.

Any help or hints on how to debug these problems further are greatly appreciated.

Here is the RCW config:

SYS_PLL_RAT=4
MEM_PLL_CFG=0
MEM_PLL_RAT=16
CGA_PLL1_RAT=15 /* 15:1 */
CGA_PLL2_RAT=12 /* 12:1 */

/*HWA_CGA_M1_CLK_SEL=1
HWA_CGA_M2_CLK_SEL=7
HWA_CGA_M3_CLK_SEL=6
HWA_CGA_M4_CLK_SEL=3*/
HWA_CGA_M1_CLK_SEL=7
HWA_CGA_M2_CLK_SEL=1
HWA_CGA_M3_CLK_SEL=6
HWA_CGA_M4_CLK_SEL=3

DDR_REFCLK_SEL=2
DRAM_LAT=1
DDR_RATE=0
BOOT_LOC=21
FLASH_CFG1=3
SYSCLK_FREQ=600 /* 100 MHz */
GPIO_LED_NUM=25
GPIO_LED_EN=1
IIC2_PMUX=6
IIC3_PMUX=3
IIC4_PMUX=1
IIC5_PMUX=1
IIC6_PMUX=0
XSPI1_A_DATA74_PMUX=1
XSPI1_A_DATA30_PMUX=0
XSPI1_A_BASE_PMUX=0
SDHC1_BASE_PMUX=0
SDHC2_DAT74_PMUX=0
SDHC2_BASE_PMUX=0
UART1_SOUTSIN_PMUX=0
UART2_SOUTSIN_PMUX=0
CLK_OUT_PMUX=2
ASLEEP_PMUX=1
IIC1_PMUX=0
EC1_SAI4_5_PMUX=2
EC1_SAI3_6_PMUX=2
USB_DRVVBUS_PMUX=1
USB_PWRFAULT_PMUX=1
USB3_CLK_FSEL=39
RESET_REQ_PMUX=1

/* bit 17+16 in IP_INT (enable ENETC pcie function 5+6) */
ENETC_RCW=3

SPI3_PMUX=0
GTX_CLK125_PMUX=1 /* GPIO */
SRDS_PRTCL_S1_L0=11 /* PCIe */
SRDS_PRTCL_S1_L1=8 /* SGMII */
SRDS_PRTCL_S1_L2=11 /* PCIe */
SRDS_PRTCL_S1_L3=14 /* SATA */

SRDS_PLL_PD_PLL1=0 /* no powerdown */
SRDS_PLL_PD_PLL2=0 /* no powerdown */
SRDS_PLL_REF_CLK_SEL_S1=0 /* 0: 100MHz. 1: 125MHz. */
SRDS_S1_REFCLK_SRC_SEL=0 /* SD1_REF_CLK1_P/N + SD1_REF_CLK2_P/N */
SRDS_DIV_PEX_S1=1 /* 0: maximum: 8G. 1: 5G, 2: 2.5G */
/* see page 320 */

/* Errata for PCIe controller */

#include <../ls1028asi/a008851.rcw>
#include <../ls1028asi/a010477.rcw>
#include <../ls1028asi/a009531.rcw>

/* Errata for SATA controller */
#include <../ls1028asi/a010554.rcw>

Labels (1)
0 Kudos
Reply
2 Replies

1,572 Views
mntmn
Contributor II

This issue was finally resolved. It was caused by a problem with the soldering oven, the parts were under high temperature for too long. We remanufactured a trial run of the boards at a board house and the hardware started working correctly.

0 Kudos
Reply

2,416 Views
yipingwang
NXP TechSupport
NXP TechSupport

Please probe the SerDes PLL supply lanes for glitches and value on oscilloscope.
Provide us a complete dump for SerDes registers, PCS registers.
Provide LTSSM value for PCIe
SATA register dump
For SGMII, run a ping test and set LaneBTCSR3[LPBK_EN]=01 and check for CDR_LCK bit in same register.


We want more information on failure. Does it always happen or is it that sometimes SerDes fails to work? Is there a pattern when you see the failure? Is there a specific lane which fails? Can you provide a SerDes register dump for pass and fail case.

We have checked PLL supply filters. We have checked the conenction of SerDes lanes, RCW values, Clock generator. So far nothing wrong was seen.

0 Kudos
Reply