We have a system consisting of an MPC8347EA processor and two DDR RAMs (M46V32M16) connected in parallel as a 32-bit data bus.
We are experiencing some strange behavior when writing/reading certain patterns to the DDR RAM.
We have a small test program that initially writes the repeating pattern 0x0000FFFF 0xFFFF0000 to an area (area 1) in the RAM. It then enters a loop where it writes to another area (area 2) and reads all values from area 1. After a while the reading gets corrupted. The RAM controller or the RAM itself seems to have entered a state where reading anywhere in the RAM will cause the lower byte in the upper 16-bit word to be corrupted. The value read from this byte seems to be one of the bytes read in the previous burst, see following example:
00000000: 00000000 11111111 22222222 33333333
00000010: 44444444 55555555 66666666 77777777
00000020: 88888888 99999999 AAAAAAAA BBBBBBBB
00000030: CCCCCCCC DDDDDDDD EEEEEEEE FFFFFFFF
00000000: 00XX0000 11551111 22442222 33773333
00000010: 44664444 55115555 66006666 77337777
00000020: 88228888 99DD9999 AACCAAAA BBFFBBBB
00000030: CCEECCCC DD99DDDD EE88EEEE FFBBFFFF
The XX depends on the previous reading.
- Instruction cache is enabled and data cache is disabled.
- 8-word bursts (32 bytes).
- Because data cache is disabled, every word in the example above is read as an 8-word burst.
- The values written to area 2 do not seem to be important, but we write 0x0.
- The pattern written to area 1 is very important. No values other than 0x0000FFFF 0xFFFF0000 have been found to be causing the problem.
- The size of area 1 has to be at least 4 words (writing/reading the pattern at least twice).
- The size of area 2 can be as low as one word.
- The problem can occur after just a few loops.
- The processor has to be reset to exit the erroneous state.
- When the erroneous state is exited the data is read correctly from the RAM.
- Any data written to the RAM when in the erroneous state is actually written to the RAM and can be read from the RAM when the erroneous state is exited, i.e. writing seems to work correctly, even in the erroneous state.
- No writing to area 1 after the initial writing is necessary to enter the erroneous state.
- Increasing the ambient temperature seems to improve the system performance. The problem disappeared when the temperature was increased to above 50 degrees Celsius.
- Lowering voltage to the RAM and processor (2.5V) to 2.44 V also improved the performance.
Has this problem been observed before?
Any suggestions to what we are doing wrong?
We think the problem is related to the RAM controller, but could it be the RAM?