We have a collection of ten Vybrid-based cards with DDR3 (Micron MT41K256M16HA-125 AIT:E). Based on the TWR-VF610 we have configured the DRAM and successfully managed to run our application in the external memory. At boot time, we have some light-wight memory tests that run from the internal memory before the bootloader relocates itself to the external memory.
Out of these ten cards, two of these cards fail to boot when power is initially turned on. At this time, the boot-time memory tests indicate failure and the bootloader fail to relocate to DRAM, and instead crashes. However, if we then reset the card through an external watchdog, the card boot successfully, and the memory tests pass.
When either of these cards succeed to boot, the top-level application do run flawlessly and any memory tests we run succeed without failure.
We tried to implement a workaround where the cards would soft-reset if either of the internal memory tests failed. This seemed to help on of the effected cards, but not the other. The card would enter a reset-loop, which could only be aborted by hitting the reset-button.
Does anyone have an idea of what might be causing this issue? All boards mentioned are seemingly identical, and should in theory behave identical too. However this is not the case.
Cheers
/Tom
The most likely cause of te issue is some inappropriate design of the DDR bus interface that may cause some random memory errors in some conditions. Please check your design against the recommendations, given in the Sections 3.1 to 3.6 of the Vybrid series Hardware Development Gude document, available on the processor's Documentation web page (check the "User Guides" section):
Pay special attention to the traces length matching and traces impedance matching.
Also, try to play with the drive strength settings and ODT settings.
Have a great day,
Artur
Hi Artur,
We just learned that the problem discussed disappeared at low temperatures (around -20C). The problem re-appeared when heated back to room-temperature.
We hence assume the issue is timing-related.
Could you list the parameters that are especially susceptible to temperature changes?
/Tom
The traces length mismatch you've pointed to is definitely the root cause of the issue. In these conditions, changing the drive strength and ODT settings is not the effective way to fix the issue. The only reliable way to fix it is to redesign the PCB to meet the traces length matching rules as close as possible.
Best Regards,
Artur
Hi Artur,
We have had the HW engineers have a look at your suggestion, and they concluded that some of our trace lengths are marginally too short.
The Hardware Development Guide, section 3.5.2, Table 21 list the minimum reference length of the address, bank, CAS, RAS, WE lines as Clock(min)-200mils. Our trace is Clock(min)-219mils.
We also had the HW engineers check the power-sequencing on two cards. One card effected by the problem described, and one that so far has not shown this problem. They were unable to find any differences between the two cards.
Following your suggestion, we also attempted to adjust the Drive Strength. Following are two extrema recorded when the board fails to boot after a cold-reset.
[GDB] Vybrid DDR3 DSE=150ohm - Pastebin.com
Vybrid DDR3 DSE=25ohm - Pastebin.com
As you can see in both logs, there are columns that appear inaccessible when writing to the u32-pointer *start. There also seem to be some "bleeding", as some data overwrite the wrong area when writing to the u32-pointer *(start+1).
You also mentioned to "play" with the ODT settings. Could you please elaborate on this?
/Tom
Hi Thomas, I requested the team to review this case.