MCF5329 (MCF5xxx) + SDR DRAM = lockup, anyone else?

TomE · ‎09-20-2010

We have a board based on the MCF5329, but this problem could apply to the MCF5373, MCF5208 and others.

We are using 32-bit SDR (non-DDR) SDRAM on a non-split bus.

Some boards intermittently lock up on power on. They are most sensitive to "long brownouts" where the power is turned off for about one second and then turned on again. Some boards never (seem to, so far) fail, others do sometimes. Sometimes as little as 1 in 200 switch-ons, but that's one time too many.

Checking a wedged board shows the SDRAM is driving data onto the data bus before and after RESET, and this prevents the CPU from fetching code from the FLASH chip, so it crashes and halts.

We've been working with our distributor and Freescale Support for 5 weeks on this and are making good progress in understanding the problem and developing workarounds.

A fast power-on ramp is required to show this problem. Boards powered from a plug-pack turned on at the mains don't do this. Ones switched to existing supplies (battery, bench supply) do. So you may not see problems on the bench, but do get them in the field.

I'd be surprised if we're the only ones seeing this, but not surprised if we're the first ones to be closing in on the root cause.

Has anyone else had reports of problems like these?

TomE · ‎09-23-2010

I wrote:

> Has anyone else had reports of problems like these?

Nobody else? I'm surprised.

Here's the cause and possible/unofficial workarounds.

CAUSE

When I mention "SDRAM Outputs" I'm referring to the control signals output by the SDRAM controller. This includes SD_CLK, SD_CKE, SD_CS0, SD_WE, SD_SRAS, SD_SCAS and SD_DQMn.

On initial power up, the SDRAM Outputs are tri-state. Their levels are thus "indeterminate". They may be floating high, low, be capacitively coupled to the 3.3V plane or whatever.

If this power-on is close to the last power-off (a fast ON-OFF-ON) then the EVDD and IVDD rails may be well above zero, and the other signals may even be above that, being dragged down by the protection diodes to the rails. There are a lot of variables here.

SDRAM chips do not have RESET pins. They are hoped to wake up "idle" but their data sheets don't guarantee that. Combine that with floating inputs and different levels after different timed OFF-ON-OFF resets, and anything can happen.

The SDRAM controller in the MCF5329 (and the same controller in other models) does not seem to be directly initialised by the /RESET signal. It wakes up in a "random state" and this is reflected in all of the SRAM Outputs. It doesn't settle down to its IDLE state until after the oscillator has started up and fed some clock edges through it. It seems more likely to do this on an ON-OFF-ON than a cold start. When in "non-IDLE" mode it is sending a random command to the SDRAM.

When EVDD ramps up to about 2.6V (between 2.0 and 2.8V) the SDRAM Outputs are enabled. If this happens after the crystal/oscillator has started up, the initial output state will switch from "floating" to "driven IDLE state". If the EVDD ramp is reasonably fast (500us or less) the oscillator won't have started and the SDRAM Outputs will change from "floating" to "driven RANDOM". As the oscillator starts up the outputs go through one or more transitions back to the IDLE state.

If you're unlucky, the uninitialised SDRAM chip will have interpreted all these changes on its pins as some sort of READ command and it will drive the data bus. If your luck comes good, it may then let go during the "Random" to "IDLE" changes, but it may not.

When the CPU comes out of reset and tries to read its code from FLASH, the bus contention between the SDRAM and FLASH usually ends up with the CPU halted very quickly. This is the SECF045 problem with a different cause.

The Watchdog doesn't help as it ignores HALT state until the CPU programs it otherwise. Even if it could work and tried to reset the CPU there'd be no changes in the signals to the SDRAM and it would still be jamming the bus.

The only recovery from this condition is power off. The same lockup might happen on the next power on though.

WORKAROUNDS

None of these are "official or recommended by Freescale". Many of them may reduce the incidence of the problem without eliminating it entirely.

Don't use SDR SDRAM, but design with DDR on a "split bus".

Add an external hardware watchdog that can fully power-cycle the CPU if it isn't patted in time. Power-cycle, not Reset.

Don't turn the board OFF_ON_OFF, but wait a long time before turning it on. You may not have control of this.

Add bleed resistors to the EVDD and IVDD rails to get them to ground fast on a power off if you can waste the power. This seems to help a bit.

Add pulldown resistors to the SDRAM Outputs, especially SD_CLK and SD_CKE. The SDRAM seems most sensitive to these floating. Do not add a PULLUP to SD_CKE as this makes it worse (SECF150 for the MCF52277 suggests this for a different problem).

Slow down the power-on ramp so the crystal or external oscillator is running before EVDD gets to 2.0V and enables the SDRAM Outputs. This stops the "random states". This can be done either:

1 - With a separately powered external oscillator instead of a crystal (and a buffer or current limiter plus all the required RCON hardware), or

2 - With a crystal on the internal oscillator.

The internal oscillator is only guaranteed to run when EVDD is over 3.0V, so for a design that meets published specifications you need to get EVDD to the Oscillator to 3.0V BEFORE EVDD gets to 2.0V for the SDRAM Outputs! That should be impossible, except the Oscillator VDD supply is on the undocumented pin K12 on the 256BGA, so it can be driven from a separate supply if need be. The three Reference Design schematics document this pin and the other 9 power pins the Reference Manual doesn't list. In practice the oscillator seems to start at 1.6V, so a VERY slow power ramp (10-100ms) seems to work OK.

TomE · ‎09-27-2010

More information.

Not surprisingly it seems to depend on the brand of the SDR SDRAM chip used.

The original brand we used could be seen driving the data bus when it got the "random" commands from the SDRAM Controller. Sometimes it would release the bus later (when the SDRAM controller started sending IDLE commands), other times it wouldn't. The latter case lead to the permanent lockup condition.

The alternate brand of chip also drives the data bus, but it always (in all tests I've done) releases the bus when the SDRAM controller starts driving the IDLE commands at it.

So in our case Production started using an alternate chip and the problem seems to have been fixed by that change.

In other cases, this problem may show up in the opposite direction, if Production changes TO the "sensitive" brand.

TomE · ‎04-27-2011

Freescale have updated the "MCF5329DE.pdf" 'Chip Errata" document to Revision 11. This adds:

SECF196: Pins: Undefined Pin States Caused by Power-On Reset Sequence

It describes the problem detailed in this Forum article.

It states:

Errata type: Silicon
Affects: FlexBus, SDRAMC, and BDM
Description: Some power ramp and clock startup sequences can cause FlexBus, SDRAMC, and BDM pins to drive undefined values for a period of time during the reset sequence.

Initially the pins are tri-stated while IVDD and EVDD/SDVDD are not fully powered. As the voltage rails ramp to the operating levels, the processor releases the pins from tri-state and the pin states are determined by digital control logic. However, the digital logic needs several clock cycles to initialize, so between tri-state release and clock startup the pins can be driven by uninitialized logic.

Depending on how the processor signals are used in the rest of the system, an undefined pin state could cause unexpected behavior for an external memory device. Since the undefined pin states are temporary (pins will enter a defined state well before the processor exits reset) in many cases the undefined pin states won’t cause a functional issue for the system.

The Errata doesn't explicitly state that "if this startup sequence causes the SDRAM chip to wedge then the CPU will not recover".

Freescale have also issued the accompanying "EB740: Detailed Information for Erratum SECF196" which contains the following section that does details the problem that I was seeing:

2.2 SDRAM impact
The SDRAMC on the MCF537x/MCF532x family is a special case. To fix an earlier erratum (SECF045),
the SD_CLK signal is driven as soon as clocks are detected and SD_CKE is driven high. Because of this
fix, there will be a short overlap between undefined pin states on the SDRAM control signals and SD_CLK
starting up. This can allow an SDRAM to latch an erroneous SDRAM command.

During initial startup, the SDRAM doesn’t allow any commands except NOP and COMMAND INHIBIT.
Undefined signal states on the SDRAMC signals could potentially be recognized as other commands and
cause undefined operation of the SDRAM. If the undefined operation of the SDRAM causes the SDRAM
to drive the data bus, it could potentially cause interference with chip configuration and code fetches.

The Workaround in the Errata doesn't mention any of those suggested previously in this post. The main workaround given in SECF045 still applies, which is "Use a split bus mode configuration with DDR SDRAM or SDR SDRAM". EB740, states that there is no issue with a system using a Split Bus.

The Workarounds suggest qualifying the chip select signals by adding an octal buffer in series with them, and forcing it to be tri-state diring reset. This includs the chip-select signal from the SDRAM controller to the SDRAM chips. My reading of the specifications gives a maximum signal delay of 3.75ns on the chip select before it violates the SDRAM timing, and for safety you'd probably want a lot less. Chips that fast are available, but rare. Timing is probably less severe on the FlexBus chip selects.

TomE · ‎08-08-2011

This is the problem with the MCF532x and MCF537x chips listed in the Errata as:

SECF196: Pins: Undefined Pin States Caused by Power-On Reset Sequence

This is the problem that I worked on for six weeks last August (2010) where some boards made with the MCF5329 would lock up after a power-off-power-on cycle.

This looks to be fixed in the next version of the mask. This is documented here:

http://cache.freescale.com/files/shared/doc/pcn/PCN14798.htm

PRODUCT AND PROCESS CHANGE NOTIFICATION

ISSUE DATE:	28-Jul-2011
NOTIFICATION:	14798
TITLE:	MCF532X/MCF537X NEW MASKSET 6M29B DESIGN FIX
EFFECTIVE DATE:	26-Oct-2011

The details are:

DESCRIPTION OF CHANGE

Freescale is pleased to announce the qualification of the new maskset 6M29B for MCF532X and MCF537X. With enhancement made on this new maskset, undeterministic behaviour during POR has been fixed.

REASON FOR CHANGE

The release and implementation of this new maskset is to fix device undeterninistic [sic] behaviour during POR.

This fix seems to match "SECF196: Pins: Undefined Pin States Caused by Power-On Reset Sequence" without actually saying so clearly (or with words that exist in a dictionary :smileyhappy:.

Tom

dn_engineer · ‎05-11-2011

Could you share which SDRAM part works for you and which one doesn't? We are using 48LC8M16A2, will be a problem with that part?

Thanks.

TomE · ‎05-11-2011

> Could you share which SDRAM part works for you and which one doesn't?

With you, yes, but not with the whole planet. Check your Freescale "private messages".

With a different SDRAM we didn't get a lockup with limited testing. But with the "sensitive brand" we had some boards that would lock up and otherwise identical ones that didn't. So the different brand of chip didn't give us any guarantee at all that no boards in a production run would fail.

The measures that minimised the problem were:

Add bleed resistors to the EVDD and IVDD rails to get them to ground fast on a power off if you can waste the power. This seems to help a bit.
Add pulldown resistors to the SDRAM Outputs, especially SD_CLK and SD_CKE. The SDRAM seems most sensitive to these floating. Do not add a PULLUP to SD_CKE as this makes it worse (SECF150 for the MCF52277 suggests this for a different problem).

Those changed fixed a few failing boards. Bot no guarantees on a production run.

The measure that gave a guaranteed fix for us was:

Slow down the power-on ramp so the crystal or external oscillator is running before EVDD gets to 2.0V and enables the SDRAM Outputs. Worst-case calculations (for crystal start time) result in a ramp of 30ms or so.

The switching regulator we were using for the 3.3V supply allowed a very simple modification (Capacitor, two resistors and a transistor) to slow down the turn-on ramp.

Tom

TomE · ‎05-12-2011

Another possible solution that might suit some designs is to configure the CPU to use an "External Reference" for its clock source, and to ensure that externa oscillator is running before the 3V3 supply to the CPU is turned on. This would need some interesting power control and buffering to make sure the clock signal didn't exceed the CPU's input 3V3 supply (level-shifting buffer driven from the CPU's 3.3V supply).

This is a fairly obvious workaround to this problem, but isn't mentioned in SECF196 or EB740.

Tom

maxhexis · ‎05-16-2011

If the oscillator is supplied with the same VDD as the CPU, there are no problems about input overdrive. Typically, oscillator starts well before the CPU, even if at imprecise frequency (oscillation is present already with 2V applied on power supply). I have a working design based on this workaround.

Best regards,

TomE · ‎05-17-2011

That still needs a slow ramping power supply.

The oscillator has to be running before EVDD gets to 2.0V. Before we slowed the ramp down, our 3V3 supply could get from 0 to 2V in about 200us. The crystal typically took 500us to start (and 1ms with a CRO probe on it), and according to specifications couldn't be guaranteed to be running until 10ms had passed.

> Typically, oscillator starts well before the CPU

Can you get a guarantee from the manufacturer of the minimum start time? That's the only number you can trust in a production run.

It took a while, but I eventually managed to demonstrate this SDRAM lockup problem on a standard development board. The "final trick" was to power the board up by plugging the cable from the plug-pack into the board. That gave a fast power-on-ramp that demonstrated the problem. If I tested power-on by turning the plug pack on at the wall, the very slow startup time of the plug-pack hid the problem.

I wonder how many other people test "power off recovery" by turning their plug-packs off at the wall, or turn their controlled bench supplies on and off? That's seldom the power environment of the production device. If it ends up in a car the power can turn off and on at a very different rate than on the bench.

Tom

maxhexis · ‎05-17-2011

In fact, I usually prefer to have a DC/DC with controlled power ramp directly on the CPU board. This is the context in which I tested the extrenal oscillator behaviour. Moreover, the oscillator has a specified maximum start time of 2ms, which is sufficient for our power slew rate.

MCF5329 (MCF5xxx) + SDR DRAM = lockup, anyone else?

MCF5329 (MCF5xxx) + SDR DRAM = lockup, anyone else?

General