Background of hardware and software used.
- Freescale or other standard reference board used = custom design board
- Kernel version and/or BSP release used = based on 3.0.35
- Any additional software/application or hardware used = i.MX6 Solo, 512MB DDR2 RAM (Micron MT42L128M32D1LF-18 WT:A @396MHz), 1GB NAND flash (Toshiba TC58NYG3S0FBAID)
Recently, we found numbers of units failed during firmware download by mfgtool.
The failure rate is around 2% ( 20 units out of 1000)
The following was the highlighted failure log during the download. (A detail sample failure log was attached.)
ModuleID LevelID: ExecuteCommand--Push[WndIndex:0], Body is $ mount -t ubifs ubi2:system /mnt/system
ModuleID LevelID: ExecuteCommand--Push[WndIndex:0], Body is pipe busybox tar -xv -C /mnt
ModuleID LevelID: PortMgrDlg(0)--MxHidDevice--Command Push excute failed
ModuleID LevelID: CmdOperation, current command executed failed, so SetEvent(hDevCanDeleteEvent)
We tried to locate the problem when the crash happened.
For those 20 failure units, they have no problem in ROM read, RAM init, firmware down, enumerate USB as MSC.
The failure units could also run UCL command, but they easily crashed when unzipping big image. (checked in serial console)
(push command should be successfully sent out and executed, but there would be failure during processing in i.MX6)
- The failure happened very frequently on unzipping big files (system.tar ~150MB).
- For small files (recovery.tar ~ 6MB), there would be no crash.
Observation and experiment:
- We tried to lower the DDR2 RAM clock rate to half (reduced PLL2 clock), all failure units could complete the whole download procedures.
However, even though they could complete the firmware download, they will easily hang up during normal boot or operation.
- Then we tried to run the DDR Stress test v1.0.2 on those failure units and using the same RAM init code.
Most of them can pass at the rated clock (=396MHz) and only fail at higher stressed clock.
- We also replaced a new working NAND flash on those failure board, the failure still exists. (NAND seems to be not a problem)
1. By reducing the DDR2 RAM clock to half, it could complete the download procedures. It sounds like a DDR RAM clock issue.
On the other hand, it passed the DDR stress test.
Are there any test limitations for the DDR Stress test which is not able to demonstrate some RAM setting or hardware problem in real operation?
2. For normal practice, Is it necessary for units to pass DDR stress test at a high clock rate such that to have more margin?
3. I have attached the RAM init file. Could you point out some critical setting that we should focus on?
4. We have one unit (1 out of 1000) that fail DDR stress test. It could only pass at clock rate = 365MHz but fail at 396MHz.
That unit also fail to complete download process. (the failure happened during RAM init stage but sometimes it could continue to process until get crashed in some UCL command execution)
By reducing DDR clock rate to half, it could also complete the download.
In the DDR stress test, it could complete the calibration at rated clock = 396MHz but it still fail at RAM test at 396MHz.
I wondered if the calibrated setting has not been implemented such that failure resulted in both stress test and operation.
It is still difficult to conclude the RAM had an issue as most of the failure units have passed the DDR stress test.
but many observations are pointing to RAM having a problem.
Original Attachment has been moved to: MfgTool-2.log.zip
Original Attachment has been moved to: solo_lpddr2_400mhz_cs0_32bit_v06.inc.zip