MCF5329 with random SDRAM 1k "bit rot" corruptions

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

MCF5329 with random SDRAM 1k "bit rot" corruptions

Jump to solution
2,986 Views
TomE
Specialist II

I'm working on an MCF5329 board.

 

It has been working fine for months. We've got 16M Spansion SDRAM, 32M FLASH (sharing the bus), Ethernet, video, CAN, serial ports etc.

 

Every now and then, usually on overnight tests after an hour or sometimes after three DAYS, 1k blocks of SDRAM start

showing "bit rot". Random, usually single bits in bytes get set. No match on any data anywhere else in SDRAM or FLASH, so it isn't looking like a corrupted pointer during a copy. I can't see how any code bugs could be generating these patterns. It looks like the SDRAM is getting "failure to refresh" somehow. The 1k blocks are 1k-aligned, and often share address patterns, like "0XnnnCC00" where "nnn" varies.

 

There are multiple boards doing similar things, though different ones seem to show preferences for different address ranges to corrupt.

 

We've been over the SDRAM initialisation many times, and it is almost exactly what the "Coldfire Init" program recommends. Refresh is enabled. We're getting the same thing with our FLASH INIT and the debugger's XML-driven SDRAM init.

 

Possibly related, after initialising the SDRAM we have to perform a few "dummy reads". If we don't, we can copy code from FLASH to RAM, but the first burst-read after the writes will sometimes read garbage. Nothing in the data sheets saying that is needed either.

Labels (1)
0 Kudos
1 Solution
1,087 Views
TomE
Specialist II

The reply from Freescale through our distributor detailed in the following post gives enough information to use the

"Drive Strength" configuration:

 

https://community.freescale.com/thread/66541

 

The Data and Address pins all work in "10pF Low Drive". So does the clock. The SDRAM control lines don't work at this drive strength. I'm going to have to have a good look at the signals to see which combinations look to be in spec.

 

View solution in original post

0 Kudos
6 Replies
1,087 Views
TomE
Specialist II

I meant "16M Micron SDRAM, 32M Spansion FLASH".

 

I suspected for a while there that continuous accesses to the FLASH chip on the FlexBus may have been blocking the SDRAM controller from performing Refresh cycles, but a CRO shows two things. Firstly, the SDRAM controller can generate SRAM cycles during FLASH Chip-Select assertions (assuming these are Refresh cycles as it can't use the bus for data cycles). Secondly, even when trying to read the FLASH chip on the FlexBus as fast as possible, there are 9 Flexbus cycles between successive reads. So it is slow and there's plenty of time for the SDRAM cycles in there.

 

I'm going to increase the refresh rate and see if it makes a difference.

 

0 Kudos
1,087 Views
TomE
Specialist II

Changing the refresh rate had no effect. Neither does changing the SDRAM timing or the FlexBus FLASH timing.


The OE_RULE doesn't change anything (but the signals look weird in "Tri-State")..

 

Changing the code to continuously READ from the FLASH (when not doing anything else) doesn't fail.

 

Having it WRITE to the FLASH (perform bus write cycles, but not start FLASH burning) has it corrupt RAM

within a minute. At the same time, "flashes" are visible on the LCD, indicating the LCD DMA

controller is reading bad data from the SDRAM when "competing" with the FlexBus writes.

 

I can see the SDRAM controller performing refresh cycles during the writes to the FLASH, at

the time when the address and data lines are switching, and making a lot of edges and noise.

This might be coupling into the RAM chip.

 

Performing the same write cycles to an unconnected chip-select controller (but performing the

same bus cycles and generating the same noise and under/overshoots on the bus) doesn't fail.

 

Any suggestions?

 

0 Kudos
1,087 Views
TomE
Specialist II

I wrote:

> Performing the same write cycles to an unconnected chip-select

> controller (but performing the same bus cycles and generating the same

> noise and under/overshoots on the bus) doesn't fail.

 

It is harder to set up than it looks. I was using the wrong address range. Fix that and the unit fails the same way.

 

The SDRAM corrupts when writing data to the Flexbus using an unconnected chip-select.

 

Changing the "AA" bits (Address to Chip Select Delay) to 2 or 3 makes it a lot worse, and I don't know why. Changing the Flexbus hold timing, the SDRAM timing, the refresh rate and the Crossbar setting has no effect.

 

Writing 32-bits of zero to an all-ones address has it fail nearly instantly. It depends on the number of zero data bit.

26 bits doesn't fail, and it gets progressively worse with 27, 28, 29, 30, 31 and 32 zero bits. It doesn't matter which data lines are zero, just the number of them. The "all ones address" is important as the FlexBus puts out the address on the data bus and then switches to driving the data. All-ones to all-zeros is a worst-case for something.

 

Changing from a Micron to an ISSI SDRAM chip makes the problem go away. So it looks like the ISSI is less sensitive to bus over and undershoots during reads and refresh cycles. We are probably exceeding the chip specifications.

 

A big part of the problem is that the MCF5239 doesn't work well with SDR SDRAM. It is means to be used with DDR, and there's a lot of support (and warnings on needing termination everywhere, VREF shields, etc). This isn't mentioned as being required for SDR, but it looks like it is. The MCF5329 doesn't support a "Low Drive Strength" option like previous SDR-only chips (like the MCF5235) did. So there's a lot of noise on the data and address lines because it forces a "high strength" drive.

 

0 Kudos
1,087 Views
TomE
Specialist II

I wrote:

> The MCF5329 doesn't support a "Low Drive Strength" option like

> previous SDR-only chips (like the MCF5235) did.

 

Sort-of. It does have MODE registers for FlexBus and SDRAM. These are MSCR_FLEXBUS and MSCR_SDRAM.

 

Changing them to "00 Half strength 1.8V low power/mobile DDR." (when using 3.3V SDR, not a good match there) seems to reduce the low-drive a lot and the high-drive a bit. It makes the SDRAM problem go away. I'm trying to find out what this setting really means and whether it is likely to cause other problems. I'm going through our distributor and have asked another question on this forum.

 

Driving 32 data lines at full strength and 3.3V does look to be pretty aggressive. We're seeing a lot of noise, and we're also seeing the 80MHz SDRAM Clock suffer from a "phase skip" exactly on the data bus transition which looks like something has gone wrong with the PLL. It seems to phase-shift by 1/3 of the 80MHz clock (the 240MHz CPU clock). Weird.

 

0 Kudos
1,087 Views
TomE
Specialist II

I wrote:

> and we're also seeing the 80MHz SDRAM Clock suffer from a "phase skip" exactly

> on the data bus transition which looks like something has gone wrong with the PLL.

> It seems to phase-shift by 1/3 of the 80MHz clock (the 240MHz CPU clock). Weird.

 

Freescale have found this problem and documented it in the "MCF5329 Chip Errata". It is:

 

SECF149: SD_CLK delay when using FlexBus

 

When performing FlexBus cycles, SD_CLK and SD_CLK are delayed by up
to 2.34 ns in the worst case scenario, resulting in duty cycle excursions that
may exceed DDR memory specifications for clock jitter. Though no system
problems have been found while using DDR at this time, a potential problem
may exist in some systems due to this issue.

0 Kudos
1,088 Views
TomE
Specialist II

The reply from Freescale through our distributor detailed in the following post gives enough information to use the

"Drive Strength" configuration:

 

https://community.freescale.com/thread/66541

 

The Data and Address pins all work in "10pF Low Drive". So does the clock. The SDRAM control lines don't work at this drive strength. I'm going to have to have a good look at the signals to see which combinations look to be in spec.

 

0 Kudos