When/Where to write to SDRAM ECC

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

When/Where to write to SDRAM ECC

3,904 Views
tracysmith
Contributor IV

Basic ECC SDRAM double-word consists of 8 data bytes and 1 ECC byte. For a normal ECC SDRAM operation each double-word ECC byte value must correspond to its data bytes vlaues. This correspondence can not be established when code is executed from SDRAM (in this case DDR controller will detect ECC errors) - this is why the ECC SDRAM is explicitly initialized in hardware (by means of the DDR controller) even before the U-Boot code is relocated to SDRAM.

1) Does this mean one cannot write to the ECC registers using devmem for example because there is one extra ECC byte that cannot be written to even though the four ECC lines are connected on the LS1043AQDS?

2) How can a driver, like the x86 EDAC driver read/write to the 1 ECC byte in kernel mode to control scrubbing operations if it can only be written before the U-boot code is relocated to SDRAM?

Labels (2)
Tags (1)
0 Kudos
Reply
13 Replies

3,457 Views
tracysmith
Contributor IV

This is still just as ambiguous because you don't explain how this does not correspond in any way with the DDR controller ECC operation.

I have been able to read the ERR_DETECT register offset E40h, inject an ECC error by grounding the data line, and see MME and MBE errors being set. So, reading and writing with devmem does correspond with the DDR ECC controller operation.  Moreover, the description I provided, is the description you provided me. So, if it does not correspond to the proper ECC controller operation, I'm not the one making a mistake here. My questions are based upon validating the ECC and confirming I can read the ERR_DETECT register using devmem. Also, reviewing the EDAC driver.

NXP really needs to do a better job supporting their customers other than hand waving and essentially insulting their customers.

1) Does this mean one cannot write to the ECC registers using devmem for example because there is one extra ECC byte that cannot be written to even though the four ECC lines are connected on the LS1043AQDS? The answer is simple, yes you can read/write the ECC registers from Linux OS user space!

 

2) How can a driver, like the x86 EDAC driver read/write to the 1 ECC byte in kernel mode to control scrubbing operations if it can only be written before the U-boot code is relocated to SDRAM?  This needs further explanation and NXP should be able to explain this.

0 Kudos
Reply

3,457 Views
ufedor
NXP Employee
NXP Employee

1) Yes, you can read the ECC registers from the Linux user space.

The Linux will hang if the ECC error injection will be enabled by means of the Linux user space.

2) Excuse me, where the "x86 EDAC driver" is documented?

0 Kudos
Reply

3,457 Views
tracysmith
Contributor IV

Now we are having a discussion. 

1)  I enable the ECC in uboot. So,the board is not hanging because of enabling ECC from Linux user space.  But what about enabling ECC_FIX_EN, can this be set from user space or from the Linux kernel or must it be set before uboot is relocated?  See below about this field and p. 671 of the Hardware Reference Manual 16.4.49.4. And I already know NXP does not set this bit. My question is can ECC_FIX_EN be set from the kernel or from a kernel module/driver without resetting the board?

2) Where is the x86 EDAC driver documented is my question to NXP.  Where can I find documentation on the EDAC driver?  But you have the latest LSDK, the x86 driver is here: drivers/edac/layerscape_edac.c.

It seems the x86 driver does help manage periodic scrubbing amd64_edac.c for example manages the scrub rates for K8 hardware memory scrubbing.  So, it seems to me that I should be able to enable ECC_FIX_EN from an ARM layerscape EDAC driver, set the periodic scrub rate and gather statistics based on the errors detected all from an EDAC driver.  Am I mistaken?

ECC_FIX_EN
ECC fixing enable.
The DDR controller supports ECC fixing in memory. In this mode, the DDR controller will automatically fix
any detected single-bit errors by issuing a new transaction to read the address with the failing bit,
correcting the bit, and writing the data back to memory. The single-bit error will still be counted in the
ERR_SBE register for this case, but the controller will automatically fix the error. Note that during the
'read back', the single-bit error will not be double counted in the ERR_SBE register. In addition, the DDR
controller will periodically issue a read to memory at the interval defined by ECC_SCRUB_INT. If a
single-bit error is detected during a periodic read, it will be fixed. In this case, the error will be reported as
an SSBE in the ERR_SBE register. If a multi-bit eror is detected, then it will be reported in the
ERR_DETECT register. Also note that if a subsequent single-bit error is detected at the same address
while a first error is being fixed, then the second error will not be reported. Also, after a first SBE is
detected, no other SBEs will be fixed until the first SBE has been fixed in memory.This bit should only be
set if DDR_SDRAM_CFG[ECC_EN] is also set.
NOTE: Scrubbing cannot be enabled until after the controller has cleared
DDR_SDRAM_CFG_2[D_INIT].
0b - ECC scrubbing is disabled.
1b - ECC scrubbing is enabled.

0 Kudos
Reply

3,457 Views
ufedor
NXP Employee
NXP Employee

1) The ECC_FIX_EN bit can be set on the fly.

In U-Boot this can be done after D_INIT bit is cleared by the DDR controller during initialization.

In Linux you need a special driver.

2) You wrote:

> the x86 driver is here: drivers/edac/layerscape_edac.c

This is not "x86 driver" - please refer to the source code where it is written:

Derived from mpc85xx_edac.c

0 Kudos
Reply

3,457 Views
tracysmith
Contributor IV

So, the EDAC provides what we need.

linux/drivers/edac/

layerscape_edac.c

fsl_ddr_edac.c

edac_mc.c

The edac_mc.c maintains the statistics. Layerscape_edac is a wrapper around fsl_ddr_edac.c that does much of the work. So, this does error checking, stats, and scrubbing. Since this is simply the mpc85xx_edac.c renamed.

If you can find out if there are any kernel or uboot changes required to support this edac driver, this would be very helpful. I need to backport this driver to SDK 2.0 and need to know if there are any IFC/ECC related configuration changes I need to do in the kernel and/or u-boot to support this EDAC driver. ECC is already enabled and the IFC/DDR design already fully supports ECC.

For example, is the ECC_FIX_EN enabled or why is it not enabled in the EDAC driver? The ECC_FIX_EN cues the DDR controller that it should “fix” an ECC error by issuing a new transaction to read the address with the failing bit, the DDR controller (internal to LS1043A SoC) will then correct the bit and write the data back to memory. Also, the DDR controller will periodically issue a read to all memory at the interval defined by ECC_SCRUB_INT. So this is definitely part of the patrol scrub operation that the EDAC driver manages, but is it enabled? And why is it not enabled if not?

The ECC_FIX_EN is a feature of the IFC that does the following:

The DDR controller supports ECC fixing in memory. In this mode, the DDR controller will automatically fix

any detected single-bit errors by issuing a new transaction to read the address with the failing bit,

correcting the bit, and writing the data back to memory. The single-bit error will still be counted in the

ERR_SBE register for this case, but the controller will automatically fix the error. Note that during the

'read back', the single-bit error will not be double counted in the ERR_SBE register. In addition, the DDR

controller will periodically issue a read to memory at the interval defined by ECC_SCRUB_INT. If a

single-bit error is detected during a periodic read, it will be fixed. In this case, the error will be reported as

an SSBE in the ERR_SBE register. If a multi-bit eror is detected, then it will be reported in the

ERR_DETECT register. Also note that if a subsequent single-bit error is detected at the same address

while a first error is being fixed, then the second error will not be reported. Also, after a first SBE is

detected, no other SBEs will be fixed until the first SBE has been fixed in memory.This bit should only be

set if DDR_SDRAM_CFG[ECC_EN] is also set.

Here are the three questions I have for you:

1. What changes are required to configure ECC in the kernel and uboot to support the EDAC driver for ARM?

2. Where are the kernel and uboot changes to support the EDAC driver need to be made, what files?

3. Why does the EDAC driver not require ECC_FIX_IN to be set?

For example, why is ECC_FIX_IN not set in U-boot after the D_INIT or set by the EDAC driver?

0 Kudos
Reply

3,457 Views
tracysmith
Contributor IV

>In U-Boot this can be done after D_INIT bit is cleared by the DDR controller during initialization.

In Linux you need a special driver.

When you say special driver, you mean an EDAC driver that can set this bit?

>This is not "x86 driver" - please refer to the source code where it is written:

Derived from mpc85xx_edac.c

Correct but this doesn't change my question, can this EDAC driver be used to set ECC_FIX_EN? And is there any documentation on the layerscape EDAC driver that NXP can provide?

Also, I updated my comment and said it seems the x86 driver does help manage periodic scrubbing amd64_edac.c for example manages the scrub rates for K8 hardware memory scrubbing.  So, it seems to me that I should be able to enable ECC_FIX_EN from an ARM layerscape EDAC driver, set the periodic scrub rate and gather statistics based on the errors detected all from an EDAC driver.  Am I mistaken?  I was reviewing the amd64_edac.c version that does the scrubbing.

0 Kudos
Reply

3,457 Views
ufedor
NXP Employee
NXP Employee

> When you say special driver, you mean an EDAC driver that can set this bit?

A driver which is capable to set the ECC_FIX_EN. Currently there is no such driver for the LS1043A from NXP.

0 Kudos
Reply

3,457 Views
tracysmith
Contributor IV

1) drivers/edac/layerscape_edac.c is derived from the mpc85xx_edac.c, but it has been modified to support the ls1043aqds and layerscape products that support ECC, correct?

2) Can the drivers/edac/layerscape_edac.c driver be modified to enable ECC_FIX_EN, set the periodic scrub rate and gather statistics based on the errors detected all from an EDAC driver?  Any reason why this layerscape edac driver cannot be modified to do this? 

3) The amd64_edac.c helps manage scrubbing, so any reason why the layerscape_edac.c driver cannot do the same if modified as a custom driver?

0 Kudos
Reply

3,457 Views
ufedor
NXP Employee
NXP Employee

Please address questions about Linux driver modification/creation to the professional paid support.

0 Kudos
Reply

3,457 Views
tracysmith
Contributor IV

It seems little or no modification is needed. So, paid support is definitely not needed it seems. 

drivers/edac/layerscape_edac.c

drivers/edac/fsl_ddr_edac.c

drivers/edac/edac_mc.c

 

The edac_mc.c maintains the statistics. The sysfs can be used to access counters and edac_mc.c maintains the statistics. layerscape_eda.c is a wrapper around fsl_ddr_edac.c that does much of the work. So, this does error checking, stats, and scrubbing. Since this is simply the mpc85xx_edac.c renamed.

Standard support should be able to handle the questions, for example has this layerscape EDAC driver been validated?

0 Kudos
Reply

3,457 Views
ufedor
NXP Employee
NXP Employee

1) The question is incorrect.

Please consider that no data is written to SDRAM at the moment, when a DDR controller register is written.

2) Creating a driver is a complicated task - please consider applying for professional paid support:

NXP Professional Services|NXP 

0 Kudos
Reply

3,457 Views
tracysmith
Contributor IV

This was no answer.  Is this the type of support customers should expect from NXP? Answers stating “the question is incorrect” with absolutely no attempt to answer the question, is not an answer.  Explain why it is incorrect and attempt to discuss why ECC addresses cannot be written or read from using devmem. The explanation given thus far is incomplete.

The problem with the community answers from NXP is most answers have gaps as wide as the Grand Canyon. Is this the type of support we should expect from the NXP community?  If so, NXP will fall behind their competitors.

0 Kudos
Reply

3,457 Views
ufedor
NXP Employee
NXP Employee

Sorry for ambiguity.

The "incorrect question" means that it does not correspond in any way with the DDR controller ECC operation, thus it can't be answerer. The ECC operation is described in the processor's Reference Manual.

For better understanding it is recommended to attend a dedicated training - for example from Arnewsh Inc: Arnewsh Inc. 

0 Kudos
Reply