Connecting P1011 and T1040 via PCIe - EP Locks up on 13th Write

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Connecting P1011 and T1040 via PCIe - EP Locks up on 13th Write

1,786 Views
stevebelvin
Contributor I

We have two processors that are connected directly by a PCIe x1 Gen 1.1 link.  The Root Complex (RC) is a P1011 and the EndPoint (EP) is a T1040.  We have some software that configures the SoC including the PCIe bus and is trying to talk over the PCIe bus with limited success.  The software is simple software that has a pointer into the other processor’s RAM and we try to read and write the first address in the outbound window.  For configuration we configure the an entry in the TLB1 to map the effective address to a physical address, A LAW register to route the physical address to one of the PEX controllers and then we configure the PEX controller with one inbound window and one outbound window.  These windows are for memory transactions.

 

The RC is able to read and write the EP’s RAM but the EP is not able to either read or write the RC’s RAM.  In the current version we have removed the code in the EP that tries to read from the RC’s RAM so it is only trying to write.  The software on the EP is in an infinite loop that prints a dot to the serial port after every write then waits for 1 second.  The symptom we see is that the EP executes this loop twelve times and then on the thirteenth write the CPU locks up.  When it locks up the JTAG unit we have can’t even halt it.  What are the likely things that could cause this behavior?  Where should we start looking for mis-configuration?  Could this be some type of connection problem (the links do train and we can complete writes and reads from the RC to the EP)?  I noticed that the T1040RM states that the PEX can support 12 outstanding platform writes.  Any connection to the number of writes before the lock up?

NOTE: We have a similar system with similar configuration with similar results except that the EP does not lock up unless we power off the RC.

Labels (1)
0 Kudos
11 Replies

1,176 Views
stevebelvin
Contributor I

The T1042 throws a machine check when there is a 1-to-1 mapping (BAR = TAR).  Not sure why this is an error.  This may be a bug in the silicon.

0 Kudos

1,176 Views
ufedor
NXP Employee
NXP Employee

> (BAR = TAR)

Please provide corresponding registers settings.

0 Kudos

1,176 Views
stevebelvin
Contributor I

The only T1040 registers that were changed are the following:

    PEX1_PEXOTAR1 : 0x8000_0000 to 0xA000_0000

    PEX1_PEXOBAR1 : 0x8000_0000 to 0x8000_0000

There shouldn't be anything magical about these particular values but a machine check occurs because they match.  When they are set to the values shown to the right, the system works!

Can you explain this behavior?

0 Kudos

1,176 Views
ufedor
NXP Employee
NXP Employee

Please consider the following information from the T1040 RM, 2.5.3.1 Illegal Interaction Between Inbound ATMUs and LAWs:
"Since both local access windows and inbound ATMUs map transactions to a target interface, it is essential that they not contradict one another.
For example, it is considered a programming error to have an inbound ATMU map a transaction target to the local memory space if the resulting translated local address is mapped to an external peripheral interface by a local access window. Such programming errors may result in unpredictable system deadlocks.
"

Please provide raw memory dumps of LAW and the PCIe registers areas of the T1040.

0 Kudos

1,176 Views
stevebelvin
Contributor I

The only difference in a working set and a failing set is the PEXOTAR1 which is 0x00080000 in the case where it fails.  Here is the reg dump with this register set to the failing value.  This seems overkill but worth a try.  By the way, is there a guide or wizard for setting this interface up?  That might be useful.

PCIe#1 memory mapped registers
PEX_CONFIG_ADDR: ffffffff840000b0
PEX_OTB_CPL_TOR: 0010ffff
PEX_CONF_RTY_TOR: 0400ffff
PEX_CONFIG: 00000000
PEX_PME_MES_DR: 00000080
PEX_PME_MES_DISR: 00000000
PEX_PME_MES_IER: 00000000
PEX_PMCR: 00000000
PEX_IP_BLK_REV1: 02080204
PEX_IP_BLK_REV2: 00000000
PEXOTAR0: 00000000
PEXOTEAR0: 00000000
PEXOWAR0: ffffffff80044027
PEXOTAR1: 00080000
PEXOTEAR1: 00000000
PEXOWBAR1: 00080000
PEXOWAR1: ffffffff80044013
PEXOTAR2: 00000000
PEXOTEAR2: 00000000
PEXOWBAR2: 00000000
PEXOWAR2: 00044027
PEXOTAR3: 00000000
PEXOTEAR3: 00000000
PEXOWBAR3: 00000000
PEXOWAR3: 00000000
PEXOTAR4: 00000000
PEXOTEAR4: 00000000
PEXOWBAR4: 00000000
PEXOWAR4: 00044027
PEXITAR3: 00000000
PEXIWBAR3: 00000000
PEXIWBEAR3: 00000000
PEXIWAR3: 20f44027
PEXITAR2: 00000000
PEXIWBAR2: 00000000
PEXIWBEAR2: 00000000
PEXIWAR2: 20f44027
PEXITAR1: 00030000
PEXIWBAR1: 00000000
PEXIWBEAR1: 00000000
PEXIWAR1: ffffffff80f55013
PEXITAR0: 000df000
PEXIWBAR0: 00000000
PEXIWBEAR0: 00000000
PEXIWAR0: ffffffff80e44017
PEX_ERR_DR: 00000000
PEX_ERR_EN: 00000000
PEX_ERR_DISR: 00000000
PEX_ERR_CAP_STAT: 00000000
PEX_ERR_CAP_R0: 00000000
PEX_ERR_CAP_R1: 00000000
PEX_ERR_CAP_R2: 00000000
PEX_ERR_CAP_R3: 00000000
PCIe#1 configuration:
(0000): 08241957
(0004): 00100006
(0008): 0b200111
(000c): 00000008
(0010): 71000000
(0014): ffffffff90000000
(0018): 00000000
(001c): 00000000
(0020): 00000000
(0024): 00000000
(0028): 00000000
(002c): 32304745
(0030): 00000000
(0034): 00000044
(0038): 00000000
(003c): 00000100
(0044): 7e034c01
(0048): 00000000
(004c): 00028810
(0050): 003c8001
(0054): 00002810
(0058): 0003d442
(005c): 00110000
(0060): 00000000
(0064): 00000000
(0068): 00000000
(006c): 00000000
(0070): 00000017
(0074): 00000000
(0078): 00000000
(007c): 00000000
(0100): 00010001
(0104): 00000000
(0108): 00000000
(010c): 00062010
(0110): 00000000
(0114): 00002000
(0118): 000000a0
(011c): 00000000
(0120): 00000000
(0124): 00000000
(0128): 00000000
(012c): 00000000
(0130): 00000000
(0134): 00000000
(0404): 00000016
(0440): 00000010
(0450): 0014d7ce
(0454): 01fc1e20
(0478): 32304745
(04b0): 00000001
(04b8): 00010428
(05a0): 00000000


bstrh : 0x00000000 0
bstrl : 0x00000000 0
bstrar : 0x01f0000b 32505867
lawbarh0 : 0x00000000 0
lawbarl0 : 0xdef00000 -554696704
lawar0 : 0x81000011 -2130706415
lawbarh1 : 0x00000000 0
lawbarl1 : 0xf8000000 -134217728
lawar1 : 0x81f0001a -2114977766
lawbarh2 : 0x00000000 0
lawbarl2 : 0x00000000 0
lawar2 : 0x00000000 0
lawbarh3 : 0x00000000 0
lawbarl3 : 0x00000000 0
lawar3 : 0x00000000 0
lawbarh4 : 0x00000000 0
lawbarl4 : 0x00000000 0
lawar4 : 0x00000000 0
lawbarh5 : 0x00000000 0
lawbarl5 : 0x00000000 0
lawar5 : 0x00000000 0
lawbarh6 : 0x00000000 0
lawbarl6 : 0x00000000 0
lawar6 : 0x00000000 0
lawbarh7 : 0x00000000 0
lawbarl7 : 0x00000000 0
lawar7 : 0x00000000 0
lawbarh8 : 0x00000000 0
lawbarl8 : 0x00000000 0
lawar8 : 0x00000000 0
lawbarh9 : 0x00000000 0
lawbarl9 : 0x00000000 0
lawar9 : 0x00000000 0
lawbarh10 : 0x00000000 0
lawbarl10 : 0x80000000 -2147483648
lawar10 : 0x80000015 -2147483627
lawbarh11 : 0x00000000 0
lawbarl11 : 0x00000000 0
lawar11 : 0x00000000 0
lawbarh12 : 0x00000000 0
lawbarl12 : 0x00000000 0
lawar12 : 0x00000000 0
lawbarh13 : 0x00000000 0
lawbarl13 : 0x00000000 0
lawar13 : 0x00000000 0
lawbarh14 : 0x00000000 0
lawbarl14 : 0x00000000 0
lawar14 : 0x00000000 0
lawbarh15 : 0x00000000 0
lawbarl15 : 0x00000000 0
lawar15 : 0x8100001e -2130706402

0 Kudos

1,176 Views
ufedor
NXP Employee
NXP Employee

You wrote:

> The only difference in a working set and a failing set is the PEXOTAR1 which is 0x00080000

How exactly the P1011 (RC) is configured? (similar set of registers)

0 Kudos

1,176 Views
stevebelvin
Contributor I

This bit was set in the RC and still no writes are accepted by the RC and the EP backs up.  Any other suggestions?

0 Kudos

1,176 Views
ufedor
NXP Employee
NXP Employee

Please check the RC setup referring the attached document.

0 Kudos

1,176 Views
ufedor
NXP Employee
NXP Employee

> I noticed that the T1040RM states that the PEX can support 12 outstanding platform writes.

>  Any connection to the number of writes before the lock up?

Yes, these facts are connected.

There is a misconfiguration between RC and EP, so EP is unable to complete write transfers because of endless retries.

After 12 stalling write transactions the EP's PCIe controller can't accept anymore and attempt to initiate one more stalls the system.

Please check state of the P1011 PCIe Configuration Ready Register.

0 Kudos

1,176 Views
stevebelvin
Contributor I

We are looking into the state of the P1011 PCIe Configuration Ready Register.  As stated above, the configuration is complete and the EP (T1040) is trying to send a 4 DW write packet to the P1011.  This does seem like a possible cause of retries.

Configuration Ready Register

The PCI Express configuration ready register is used to indicate configuration complete
status to the transaction layer. The transaction layer handles configuration requests from
external hosts only after the CFG_READY bit is set. All the configuration requests
received from external hosts before the CFG_READY bit is set are completed with
configuration request retry status (CRS). The CFG_READY bit in this register should be
set after all relevant configuration registers have been programmed. This makes sure the
external host reads the correct capability advertisements during enumeration.

Bit 0: Configuration ready
Note that the reset state of this bit is determined during POR.
    1 - The transaction layer accepts inbound configuration requests.
    0 - The transaction layer responds to all inbound configuration requests with retry (CRS)

0 Kudos

1,176 Views
stevebelvin
Contributor I

Here are the differences we see in the configuration registers when attempting to write on the link (values on left) from those while not writing (values on right).

           (0054):  000a281e                         (0054):  0000281e

PCI Express Device Status Register (Device_Status_Register), 16 bits

    0x000a :: URD=1:Unsupported request detected, CED=1: Correctable error detected    -- errors when writing, appears to not support the write request

 

           (0118):  000000b4                         (0118):  000000a0

Advanced Error Capabilities and Control Register

    0x000000B4 :: FIRST_ERROR_POINTER: The First Error Pointer is a read-only field that identifies the bit position of the first error reported in the uncorrectable error status register (see PCI Express Uncorrectable Error Status Register (Uncorrectable_Error_Status_Register).

 

           (011c):  40000001                         (011c):  00000000

           (0120):  0100000f                         (0120):  00000000

           (0124):  80000000                         (0124):  00000000

           (0128):  beef0000                         (0128):  00000000

Header Log Register – Transaction Layer Packet (TLP) header associated with the error 

    DW0=0x40000001   -- Memory Write Request, No extra CRC, Length = 1 DW

    DW1=0x0100000f   -- Requester ID=1, Tag unused, Byte Enables of first double word are all active, Length = 1 so last Byte Enables inactive

    DW2=0x80000000  -- Write Address = 0x80000000

    DW3=0xbeef0000    -- Data word = 0xBEEF0000 (big endian)

 

           (0130):  0000002c                         (0130):  00000000

Root Error Status (RC mode only) - This register is supported only for RC mode

    0x2c ::  NFEMR=1:Non-fatal error messages received, MEFNFR=1: Multiple ERR_FATAL/NONFATAL received, EFNFR=1: ERR_FATAL/NONFATAL received

Not sure why the EP is reporting errors here!!!

 

0 Kudos