P2020 board bringup issues - ASLEEP and READY signals OK but doesn't boot

guignbt2v · ‎10-07-2021

On a custom P2020 board we have the following issues:

During HRESET assertion ASLEEP goes high, then 200 or 300 us after RESET negation the ASLEEP signals returns to 0 and both cores READY signals go to 1. But not much useful happens afterward.

Only a single core was supposed to start (cfg_cpu0_boot=1 / cfg_cpu1_boot=0), so I would expect only a single READY signal to be asserted.

Shortly after the ASLEEP falling edge a clean sdhc_clk signal appears during around 12 sdhc_clk cycles then stops (In sdhc boot boot no activity on other SDHC signals, whatever value SDHC CD detect has). This behavior doesn't seem to depend on cfg_rom_loc[0:3] values: sdhc boot or gpcm-16 bits.
In GPCM-16 bits boot mode none of CS0/1 signals are asserted (nor any other GPCM signal).
LSYNC_OUT/IN show a ~31MHz clock.

I tried putting very wrong PLL ratio settings (CCB & cores) during PoR in order to make sure PLL cannot synchronize. I expected ASLEEP would remain asserted, but no it goes down and READY signals are asserted.

Therefore I have the feeling the PoR configuration isn't taken into account although we checked each signal (diagrams & almost every real signal with a scope). According to P2020EC "The following pins must NOT be pulled down during power-on reset: DMA1_DACK[00], LA[17], USB_STP, TSEC2_TXD[06], HRESET_REQ, MSRCID[2:3], MDVAL, ASLEEP." this signals were double checked. LA[17] level seems pretty low but isn't externally 'pulled-down'. We also rechecked the AN4261 "P2020 QorIQ Integrated Processor Design Checklist" but didn't find anything obvious.

We didn't manage to attach a JTAG/COP debugger (although having a working JTAG/COP connection isn't always straightforward even on a proper board... so I didn't spend too much time on it)

Any idea of what is going wrong? Could the P2020 be in a test mode? If so on which pin/signal should we concentrate?

Thanks

guignbt2v · ‎12-09-2021

We finally solved our problem few weeks ago.

Reason was a difference between schematics and actual PCB: right components at the wrong place in the AVDD_CORE0&1 signals pathes.

Ufedor was hence right when asking for voltage signals "as near as possible the CPU balls"... we didn't check near enough (no easy access because on bottom side of a board plugged on a main board...).

So a P2020 without proper core PLL voltages (hence cores PLLs not locked) can set its READY_P0/P1 signal and negate its ASLEEP signal.

Anyway a some point, before turning back to power supplies analyze, we performed tests on the P2020RDB board to verify which signal clock is required to start the P2020 CPU: we just turned off one way or another 1 clock signal at a time and checked whether CPU still started.
It turns out our reference board could boot with any single clock signal turned off except SYSCLK.
Unexpected behavior: the 'SerDes PLL time-out enable' 'cfg_srds_pll_toe' was in its default configuration (disabled, hence system is supposed to wait indefinitely for Serdes PLL to lock) but system managed to boot with no serdes clock.

View solution in original post

guignbt2v · ‎12-09-2021

We finally solved our problem few weeks ago.

Reason was a difference between schematics and actual PCB: right components at the wrong place in the AVDD_CORE0&1 signals pathes.

Ufedor was hence right when asking for voltage signals "as near as possible the CPU balls"... we didn't check near enough (no easy access because on bottom side of a board plugged on a main board...).

So a P2020 without proper core PLL voltages (hence cores PLLs not locked) can set its READY_P0/P1 signal and negate its ASLEEP signal.

Anyway a some point, before turning back to power supplies analyze, we performed tests on the P2020RDB board to verify which signal clock is required to start the P2020 CPU: we just turned off one way or another 1 clock signal at a time and checked whether CPU still started.
It turns out our reference board could boot with any single clock signal turned off except SYSCLK.
Unexpected behavior: the 'SerDes PLL time-out enable' 'cfg_srds_pll_toe' was in its default configuration (disabled, hence system is supposed to wait indefinitely for Serdes PLL to lock) but system managed to boot with no serdes clock.

guignbt2v · ‎10-26-2021

Hi,

Thanks for your answers.

We managed to reproduce our problem (and misc unexpected behaviors) on our 3 identical prototypes, so somehow good news...
I also managed to reproduce on our P2020-RDB board two 'unexpected' behaviors:
- the 'SDHC clock activity burst during 12 periods' even when starting from NOR
- and the 'both cores READY signals asserted after ASLEEP down, even if core1 is supposed not to start'
So these are likely 'normal' behaviors.

We found in the meantime a reason why LA17 was low (or not high enough) : CPLD wasn't always using the expected pins because, in the Lattice Diamond project constraint file, part of the 'SITE' lines were discarded with 'not so clear' warnings, lost in the middle of many other (valid/expected) warnings. Reason why such SITE line was discarded is because our vector signals were always numbered on 2 digits, so for instance LA[00], LA[01]... LA[10]... LA[20] . All signals named LA[0x] were discarded hence associated pins were assigned a random way. Once all relevant signals properly renamed, for instance LA[0], LA[1]... warnings related disappeared, we got a correct pinout back and LA17 went to a correct value (although it wasn't directly impacted by the [0x] naming problem).
Conclusion: as in other programming languages the 'no error no warning' approach would have saved some time.

We also found and fixed few points where the design rules weren't fully respected.
Anyway all this hasn't solved our problem yet.

My new question would be COP/JTAG related: We cannot attach to the P2020 with a CodeWarrior TAP yet. Is there a way to use low level 'command line' commands from the CCS (CodeWarrior Connection Server) window in order to see if something happens on the JTAG/COP link, if there is something partially initialized on the P2020 side?
When system is up and running we can attach the CW Tap and access multiple things in the system: cores registers, but also other SoC devices. Any way to access part of it if a core isn't properly initialized?

Thanks,

ufedor · ‎10-26-2021

Please configure cfg_rom_loc[0:3] for another boot ROM location and test the system behaviour.

guignbt2v · ‎11-02-2021

Done test again after all misc improvements/modifications on the board.

For all boot modes except 'flash boot 1000 & 1010) : ASLEEP goes down, READY_P0 & READY_P1 go up, no activity on LCS0_n.

For 'flash boot modes 1000 & 1010' : ASLEEP stays low, READY_P0 & READY_P1 go low, interesting activity on LCS0_n & LFALE (looks like it keeps trying to access flash).

Interesting but still weird...

Update:

Tests done on all boards, same behavior.

In Flash boot mode(s) system actively tries to load boot block. According to documentation it seems to look for 0xFF in the page to load it. As we cannot program the NAND flash yet (no functional JTAG yet) I simulated it with the CPLD which can either return 0x00 or 0xFF for all NAND read requests.

When CPLD returns 0x00 (for all NAND read accesses) the CPU keep accessing the NAND forever (incrementing the page number), as expected.

When CPLD returns 0xFF (for all NAND read accesses) the CPU access the NAND for a short time (number of read accesses not counted yet), then ASLEEP goes down and both READY_P0/1 go high. So pretty similar to other boot modes.

I should be able, one way or another, to fill NAND with proper code. However I have the feeling NAND boot hasn't solved my overall boot problem.
According to P2020 RM « 4.5.1.1 soft reset » : « Note that if SRESET_B is asserted before a given e500 core is configured to handle a machine check interrupt, a checkstop condition occurs for that particular core, which causes CKSTP_OUT0_B or CKSTP_OUT1_B to assert. » .
I consider my 0xFFFFFFFF code cannot properly configure exception handling, hence asserting SRESET_B should trigger a checkstop, shoudn't it? However when I assert SRESET_B nothing happens: no CKSTP_OUTx reaction.

I have hence the feeling my cores haven't started yet (NAND boot as all other boot modes).

We also tested different 'boot sequencer' configurations, with different boot modes: everything seems coherent with the P2020RM "4.5.2 Power-on reset sequence" steps. We don't have any valid eeprom on IIC1 for the boot sequencer but we can see I2C accesses when boot sequencer is enabled, and ASLEEP stays high, READY_Px low.

In "4.5.2 Power-on reset sequence"

step 10: "the PCI Express interfaces begin training and are released to accept external requests" + "..and the boot vectors fetched by the e500 cores are allowed to proceed unless processor booting is further held off by POR configuration inputs...".
step 11: "The ASLEEP signal negates [...] indicating the ready state of the system." "The ready state for the e500 core is also indicated by the assertion of READY_P0/TRIG_OUT [...]"

It looks like we reach step 11 (ASLEEP & READY_Px) but for some reason the e500 cores were NOT allowed to proceed during step 10...

Would you see reasons for the 'e500 cores not to be allowed to proceed' during step 10 ? (except the obvious 'CPU boot configuration') ?
What happens at step 10 if we have for some reason a PCIe training problem (all PCIe configured as host) ? Would we reach step 11 anyway?

Thanks

ufedor · ‎10-07-2021

Please provide additional information:

1) measured POR voltages of all configuration signals

2) measured frequencies of all applied clocks (digital scope traces captured at the processor's pins)

3) the processor connection schematics as searchable PDF for inspection

4) ensure that TRST_B is pulsed low during POR sequence

5) how many boards were tested?

guignbt2v · ‎10-08-2021

Thanks for your quick answer .

1) measured POR voltages of all configuration signals
Around 70 signals to measure.
Any ones more important? The 'must be driven'?
What about signals not belonging to the POR configuration signals but, according to "P2020EC
Rev. 3, 03/2016", page 28, note 7 could belong to signal which have "other manufacturing test functions" , or pins related to note 15?

2) measured frequencies of all applied clocks (digital scope traces captured at the processor's pins)

3) the processor connection schematics as searchable PDF for inspection
This is a multi board system with logic in a CPLD (VHDL code). Hence not so easy to follow everything.

4) ensure that TRST_B is pulsed low during POR sequence

TRST_B is currently asserted and negated. Would you have any specifications about it? I used as a reference the CPLD content of the P2020RDB-PCA board where both SRESET_B and TRST_B follow HRESET_B.

5) how many boards were tested?
So far same behavior on our 3 prototypes.

By the way I finally managed to get an 'ASLEEP not going down' setting cfg_sys-pll[0] to 1, hence to a 'reserved' / invalid value.

I perfectly understand you need more info to figure out goes wrong but so do I. A way is to check that everything is right and to pinpoint an error, reason why you need more info, another way is the bottom-up approach: having a rough idea of what is happening just analyzing actual behavior, which then helps focusing on a precise part.

Which signals could give me a hint of the system/cores states?
When do the 'READY' signals go high (in sync with ASLEEP going down)?
Can they go high if [cfg_cpu0_boot, cfg_cpu1]=[0,0]? If not not, how can they anyway go high?
How can I determine if we activate some 'manufacturing test function' by mistake?
(I understand that part of this information may not be public, FYI our company has a signed & currently valid NDA with NXP,)

Thanks

ufedor · ‎10-09-2021

The processor should operate in accordance with functionality described in its Reference Manual if:

1) the processor's signals and power pins are connected in accordance with Data Sheet

2) all requirements and recommendations of the processor's Design Checklist are considered

3) various malfunctions in case 1) and/or 2) is violated are not documented.

In the specific case it is needed to determine why signal level at LA17 is low.

Is it low on all three boards?