DDR3 auto calibration error and reserved register change in running time

Question asked by zhao lingjun on Feb 21, 2017
hi all

     i have a custom board use P1013 CPU with four discrete DDR3 chips. the board can run vxworks OS normally, but every about 4-10 hours the board will crash. When board crash happen i use JTAG to debug , i found the total  2GB DDR memory space value changed periodly as below.

addr:  0x0  0xXXXXXXXX      

addr:  0x4  0xXXXX7FFE

addr:  0x8  0xXXXXXXXX

addr:  0xc   0xXXXX7FFE

   the DDR value changed to fix value every last 16 bit of 8 byte. The CPU DDR bus is 64 bit and each DDR chip offer 16 bit data width , so this issue seemed like that one DDR chip broken so the whole 2GB DDR memory value changed . But after i reset the board and re-initialize DDR ,the DDR run normal again.  Debug further more, i collect the CPU DDR config register value (in cpu CCSBAR memory ) when DDR run normal to compare the value when issue reproduce. The difference show me that there are two differ as below:

1. ACE bit (auto calibration error) set in DDR_ERR_DETECT register when issue reprodued.  The ACE error seem only could be set when DDR initialize phase but in this issue this bit set when DDR running .

2. the CCSR register offset 0x2f04 value change from 0x2 to 0x1100 (0x1100 is the value when issue reproduce). The CCSR register offset 0x2f04 is P1013 reserved DDR config register with no description in datasheet.  


this board have four discrete chip MT41K256M16HA-125. The DDR setting value as below.

(DDR_SDRAM_CFG), 0x47000008

(CS0_BNDS), 0x0000007F
(CS0_CONFIG), 0x80014302
(CS1_CONFIG), 0x00000000
(CS2_CONFIG), 0x00000000
(CS3_CONFIG), 0x00000000
(CS0_CONFIG_2), 0x00000000
(TIMING_CFG_0), 0xff8f0f0f
(TIMING_CFG_1), 0xf9498546
(TIMING_CFG_2), 0x0FA8ed20
(TIMING_CFG_3), 0x010f2000
(TIMING_CFG_4), 0x00220001
(TIMING_CFG_5), 0x01401400
(DDR_SDRAM_CFG_2), 0x24401850
(DDR_SDRAM_MODE_CFG), 0x00441A11
(DDR_SDRAM_MODE_CFG_2), 0x00800000
(DDR_SDRAM_MD_CNTL), 0x00000000
(DDR_SDRAM_CLK_CTRL), 0x02800000
(DDR_INIT_ADDR), 0x00000000
(DDR_INIT_EXT_ADDRESS), 0x00000000
(DDR_DDR_ZQ_CNTL), 0x89080600
(DDR_DDR_WRLVL_CNTL), 0x8675F608
(DDR_DDR_SR_CNTR), 0x000b0000
(DDR_DDRCDR_1), 0x00000000
(DDR_DDRCDR_2), 0x00000000
(ERR_INT_EN), 0x00000000
(ERR_SBE), 0x00010000



(DDR_SDRAM_CFG), 0xc7000008



i wish someone give me some advice/comment with this issue .