Thanks Alexander.
- Please specify how many (identical) boards you have, and how many of them show such erratic behavior. If (for example) the problem is observed on only one board out of (for example) 10, than this may be a quality issue of this particular board, but not a design issue.
We have 5 boards with the 8270 and 5 boards with the 8260. We see this random crash on all the boards. While the failure is random, it takes less time for it to be seen in the 8270 based boards than the boards with the 8260. Probably within 6hrs to 12 hrs on a MPC8270 but close to a week on MPC8260 boards. Between the two boards, the only change is the processor.
- MPC8270 is quite old processor, so I assume the design is also quite old. Please specify what is changed in the design or in BOM, before the problem started to appear.
We had to redo the board for obsolescence reasons. The main changes are the SRAM memory and the Ethernet transceiver. We have also moved from MPC8260 to MPC8270.
We checked the timing on the memory devices and there seems to be sufficient timing allowances in the signals.
Also what we notice is that there are no single bit errors that are ever reported.
The one time, when we were able to log the data just before a crash, it appears that the data read from the memory is correct. But it still reported a ECC error. And the data which reportedly had the ECC error (on this occasion) is a value which is written to the RAM only once early after power up. Its read multiple times (correctly with no ECC error) before the module crashes after a few hours.
- You said "We are running the 8 byte wide data bus at 64MHz clock" - ECC assumes 9-th chip for ECC data storage, I assume this chip is present in your design, as long as you see ECC errors only occasionally, but still - please verify 9-th memory chip is present and has exactly the same part marking, as all other 8 memory chips.
Yes, there is a 9th memory chip present for the ECC memory.
Each time, the module fails, we see that the TESCR2 register has a value of 0x00004000
The memory bank indicated is the (CS1) which connects to the SRAM. But the “TESCR2[PB]” shows up as 0 for all the 8 byte lanes. Do the PB bits get updated on an ECC failure?
- Using ECC feature assumes special requirements for data pipelining and CAS latency. Please look MPC8280 Reference Manual, Sections 11.2.9 and 11.2.4 and confirm these requirements are taken into account in your memory controller settings.
We are using ECC for the SRAM interface. We don’t have parity enabled for the SDRAM interface.
To answer your question:
MCP can not be routed out to IRQ0 if the core is not disabled, but you can disable MCP propagation to the core in HID0[EMCP]
We did try this. The module does fail later on due to execution of an unknown instruction.