i.MX537 consistently fails to start on first attempt, works on second

キャンセル
次の結果を表示 
表示  限定  | 次の代わりに検索 
もしかして: 

i.MX537 consistently fails to start on first attempt, works on second

791件の閲覧回数
Slatye
Contributor I

Hi All,

We're having some very confusing problems with an i.MX537 board. Note that we've built thousands of these, most are fine, but every now and then we get a batch that doesn't seem to behave. The one I'm using now is nice because at least it fails pretty consistently; more common is that they fail about one time in twenty, which makes picking up the fault pretty challenging.

About 90% of the time (from over a thousand test runs on this unit), the behaviour is:

  • Starts U-boot correctly (from flash)
  • Copies the kernel (from SD) to RAM and verifies the checksum
  • Attempts to start the kernel
  • Fails either with a data error or silently, resulting in a CPU reset
  • Starts U-boot correctly (from flash)
  • Copies the kernel (from SD) to RAM and verifies the checksum
  • Starts the kernel and Linux perfectly and works fine from then until the next power cycle (normally several hours later)

The remaining 10% of the time, either it locks up when trying to start the kernel the first time, or it does actually boot up on the first attempt. Note that the second attempt to start the kernel has a 100% success rate; it has never taken a third U-boot attempt.

Typical boot log below.

U-Boot 2017.01-00008-g2e9b9a3-dirty (Jul 02 2021 - 14:33:52 +1000)

Board: MX53 LOCO
I2C: ready
DRAM: 1 GiB
i2c: I2C2 SDA is low, start i2c recovery...
I2C2 Recovery success
MMC: FSL_SDHC: 0
In: serial
Out: serial
Err: serial
Net: FEC
Booting in 5 sec. Type #load to abort
Booting from mmc ...
switch to partitions #0, OK
mmc0 is current device

MMC read: dev # 0, block # 2048, count 6144 ... 6144 blocks read: OK

## Booting kernel from Legacy Image at 72000000 ...
Image Name: Linux-2.6.35.3-1129-g691c08a-svn
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 2884468 Bytes = 2.8 MiB
Load Address: 70008000
Entry Point: 70008000
Verifying Checksum ... OK
Loading Kernel Image ... OK

Starting kernel ...

data abort
pc : [<aff9a3a4>] lr : [<aff559d4>]
reloc pc : [<778473a4>] lr : [<778029d4>]
sp : af550b30 ip : 00000000 fp : 72000040
r10: 00000000 r9 : af550ed0 r8 : af5531ac
r7 : affa47b0 r6 : 00000000 r5 : aff9a3a7 r4 : 6964616f
r3 : 00000000 r2 : aff5f568 r1 : 72000040 r0 : 0a000023
Flags: Nzcv IRQs off FIQs off Mode SVC_32
Resetting CPU ...

resetting ...


U-Boot 2017.01-00008-g2e9b9a3-dirty (Jul 02 2021 - 14:33:52 +1000)

Board: MX53 LOCO
I2C: ready
DRAM: 1 GiB
MMC: FSL_SDHC: 0
In: serial
Out: serial
Err: serial
Net: FEC
Booting in 5 sec. Type #load to abort
Booting from mmc ...
switch to partitions #0, OK
mmc0 is current device

MMC read: dev # 0, block # 2048, count 6144 ... 6144 blocks read: OK

## Booting kernel from Legacy Image at 72000000 ...
Image Name: Linux-2.6.35.3-1129-g691c08a-svn
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 2884468 Bytes = 2.8 MiB
Load Address: 70008000
Entry Point: 70008000
Verifying Checksum ... OK
Loading Kernel Image ... OK

Starting kernel ...

[ 0.000000] Linux version 2.6.35.3-1129-g691c08a-svn27886 (trusty@freescaledev) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #8 PREEMPT Fri May 10 14:08:50 AEST 2019
[ 0.000000] CPU: ARMv7 Processor [412fc085] revision 5 (ARMv7), cr=10c53c7f

... and it works fine from here.


Testing we've done, none of which made any difference to the behaviour:

  • Holding the CPU in reset for 10+ seconds after power-up, just in case a power supply isn't stable in time (note that we've also verified power supplies with an oscilloscope, checking correct voltage, sequencing, and stability).
  • Delaying boot by 20 - 30 seconds in U-boot (by entering #load and then waiting a while before running "boot").
  • Loading the kernel to RAM over RS232 (Ymodem) rather than from SD card. Note that this also gave delayed startup by 6 minutes (because serial is slow) but had the same result.
  • Had U-boot perform a CPU reset (so it never tried to start the kernel the first time) and then run - the second boot worked fine, so it's not that the kernel attempting to start is setting a register that then allows it to work the second time.
  • Reading back the registers set by DCD in case one was not correct - all registers apart from ESDCTL_MUR are the same on both attempts, and also the same as a known-good board that works every time. ESDCTL_MUR changes slightly each time, but we also see these changes on a known-good board. We're not seeing anything catastrophic that might indicate a complete failure to calibrate the RAM.
  • Running the system overnight with some reasonably CPU-intensive processing, to see if it's just unstable (eg. from dodgy BGA soldering, power supply glitches too short to see, etc) - worked fine, no problems.

At this point I am at a loss to explain what is going on!

 

Any ideas for what might be wrong, or ideas for further useful testing, would be very much appreciated.

 

Thanks!

0 件の賞賛
返信
1 返信

784件の閲覧回数
igorpadykov
NXP Employee
NXP Employee

Hi Evan

 

for bad board one can update calibration coefficients using below link

https://community.nxp.com/t5/i-MX-Processors-Knowledge-Base/i-MX53-Memory-Calibration-Script-AN4466/...

then run overnight ddr test (preferably at different temperatures)

DDR Stress tester kit for the i.MX51 and i.MX53

also may be useful

https://community.nxp.com/t5/i-MX-Processors/i-MX536-DDR-Stress-test-failed-when-the-temperature-of-...

If this will not help may be suggested (just for test) to resolder chip, to check if it is caused by poor soldering.

 

Best regards
igor

0 件の賞賛
返信