i have a custom board use P1013 CPU with four discrete DDR3 chips. the board can run vxworks OS normally, but every about 4-10 hours the board will crash. When board crash happen i use JTAG to debug , i found the total 2GB DDR memory space value changed periodly as below.
addr: 0x0 0xXXXXXXXX
addr: 0x4 0xXXXX7FFE
addr: 0x8 0xXXXXXXXX
addr: 0xc 0xXXXX7FFE
the DDR value changed to fix value every last 16 bit of 8 byte. The CPU DDR bus is 64 bit and each DDR chip offer 16 bit data width , so this issue seemed like that one DDR chip broken so the whole 2GB DDR memory value changed . But after i reset the board and re-initialize DDR ,the DDR run normal again. Debug further more, i collect the CPU DDR config register value (in cpu CCSBAR memory ) when DDR run normal to compare the value when issue reproduce. The difference show me that there are two differ as below:
1. ACE bit (auto calibration error) set in DDR_ERR_DETECT register when issue reprodued. The ACE error seem only could be set when DDR initialize phase but in this issue this bit set when DDR running .
2. the CCSR register offset 0x2f04 value change from 0x2 to 0x1100 (0x1100 is the value when issue reproduce). The CCSR register offset 0x2f04 is P1013 reserved DDR config register with no description in datasheet.
i wish someone give me some advice/comment with this issue .
we have resoled this issue .i forget close it .
the root cause is PCB layout issue , the DDR signal integrity not good .
as workaround , we just use two DDR chip with 32bit mode to solve this issue.
hope this can works to others~~