Triggering ECC errors on LS1043ARDB?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Triggering ECC errors on LS1043ARDB?

Jump to solution
3,442 Views
tracysmith
Contributor IV

I need to trigger ECC errors on the LS1043ARDB.  What can be used to trigger ECC errors for the LS1043ARDB?

The rasdaemon does not invoke or trigger ECC errors, it is only used for detection and collection of ECC errors. I need to do ECC error detection and invoke ECC errors for memory validation.  I could use the rasdaemon to do ECC error collection and detection on Linux, but this is not provided as part of the rootfs for any of the NXP BSP builds. Nor is journalctl or journald. The journalctl is needed for structured viewing of the rasdaemon log. The trace would be logged under /sys/kernel/debug/tracing, and reporting via syslog/journald.But this is not available on the BSP.

Any recommendations on what to use on the current releases of SDK 2.0?  Or, anything NXP has available for ECC validation for the LS1043ARDB?

Labels (2)
1 Solution
2,917 Views
andrei_skok
NXP Employee
NXP Employee

The ls1043 device itself supports ECC on the DDR interface. This means it is able to correct single bit errors and detect multiple bit errors if ECC memory is implemented in hardware (on the board). In other words it is not enough if the processor supports ECC, the board should also be designed with ECC support. For example, the LS1043A-RDB does not support ECC because on board DDR4 memory is 32-bit wide, i.e. without ECC. If one designs a LS1043A-based board, he/she needs to build 36-bit DDR interface to be able to use ECC feature, where 32-bit is the memory itself and 4-bit is ECC memory.

The following NXP boards provide ECC support: LS1043A-QDS, LS1046A-RDB.

View solution in original post

0 Kudos
Reply
9 Replies
2,917 Views
ziv_chen
Contributor I

Hi,

Do you have ECC injection note in Linux commands form? 

And how to check if ECC single bit correction is occurred?

Thanks a lot

0 Kudos
Reply
2,917 Views
tracysmith
Contributor IV

My understanding is ECC is not supported on the ls1043. What does this mean since the DDR section in the hardware Manuel specifies ECC registers that can be read and written from?

Please reconfirm if hardware memory error correction is or is not supported on the ls1043.

Does this mean that the ls1043 does not correct for DDR memory errors?

Which board in the LS family supports error memory correction and detection in the hardware and does it also support error detection from the software such as logging the ECC errors, and make use of the the ECC DDR registers?

0 Kudos
Reply
2,918 Views
andrei_skok
NXP Employee
NXP Employee

The ls1043 device itself supports ECC on the DDR interface. This means it is able to correct single bit errors and detect multiple bit errors if ECC memory is implemented in hardware (on the board). In other words it is not enough if the processor supports ECC, the board should also be designed with ECC support. For example, the LS1043A-RDB does not support ECC because on board DDR4 memory is 32-bit wide, i.e. without ECC. If one designs a LS1043A-based board, he/she needs to build 36-bit DDR interface to be able to use ECC feature, where 32-bit is the memory itself and 4-bit is ECC memory.

The following NXP boards provide ECC support: LS1043A-QDS, LS1046A-RDB.

0 Kudos
Reply
2,917 Views
tracysmith
Contributor IV

The LS1043ARDB I am using has the 4 ECC lines and a ECC DDR4 and IFC that manages the ECC identical to the LS1043A-QDS as specified and expected by NXP. It is the same as the LS1043A-QDS in this regard. 

Why does step 4 below cause a reset?

# devmem 0x1080E08 w 0x00030000

Please refer to the following procedure to trigger ECC errors.

1a. Configure DDR_SDRAM_CFG_2[D_INIT]=0b'1 in Target Initialization File.
1b. Configure DDR_SDRAM_CFG[ECC_EN]=0b'1 in Target Initialization File.
2. Set SBE threshold to 1 in the code. ERR_SBE[SBET]= 0b'1
3. Enable error detection in the code. ERR_DISABLE[MBED,SBED]= 0b'00
4. Select mirror byte function. Inject error. ERR_INJECT[EMB, EIEN]= 0b'11
5. Write one 32-bit data 0x55aa_0000 into memory location 0x5000
6. Disable error injection : ERR_INJECT[EMB, EIEN]= 0b'00.
7. Read a 32-bit data from memory location 0x5000 to trigger ECC error.

0 Kudos
Reply
2,917 Views
andrei_skok
NXP Employee
NXP Employee

If you really tells about the NXP LS1043ARDB board it doesn't have ECC lines connected to the DDR memory.

According to the LS1043ARDB board schematics all four MECCx lines are left unconnected.

0 Kudos
Reply
2,917 Views
tracysmith
Contributor IV

Let me try this again. We have an ls1043ardb custom board and the lines are connected identical to the ECC supported lS1043AQDS. Why is this so hard for you guys to understand?

Please answer my question why there is a reset if the lines are connected and ECC is enabled when we set the bits indicated above. This should happen on the ls1043AQDS since our custom ls1043ardb is connected identical to the LS1043AQDS. This is frustrating, do you understand what I’m saying?

0 Kudos
Reply
2,917 Views
tracysmith
Contributor IV

Hi TIC,

4. Select mirror byte function. Inject error. ERR_INJECT[EMB, EIEN]= 0b'11

Did you mean the ECC_ERR_INJECT[EMB, EIEN]=0b'11?

Having difficulty writing to this address.  The default is 0b'00.  I have done all the prior steps except the first step since it is configured by the software at boot and cleared by the controller (DDR_SDRAM_CFG_2[D_INIT]=0b'1).  This bit is set by software, and it is cleared by hardware. If software sets this bit before the memory controller is enabled, the controller will automatically initialize DRAM after it is enabled. This bit will be automatically cleared by hardware once the initialization is completed. This data initialization bit should only be set when the controller is idle.

Am I missing something? Why does writing to ECC_ERR_INJECT[EMB,EIEN]=0b'11 cause a reset?

root@ls1043ardb:~# devmem 0x1080E08 w 0x00030000

 [ 839.465065] Unhandled fault: synchronous external abort (0x96000210) at 0xff

WDT is triggered and the board resets.

0 Kudos
Reply
2,917 Views
tracysmith
Contributor IV

Here is an example:

1b. Configure DDR_SDRAM_CFG[ECC_EN]=0b'1 in Target Initialization File.

Comment in reference manual: If this bit is set to 1, DDR_SDRAM_CFG[ACC_ECC_EN] must be set to 1 as well.

The hardware reference manual says that if the ECC_EN is set, which it is, then ACC_ECC_EN should be set. To check this, check bit 29 in the DDR_SDRAM_CFG register, i.e., ACC_ECC_EN.

> devmem 0x1080110 8

OxE5

> devmem 0x1080111 8

0x0C

> devmem 0x1080112 8

0x00

> devmem 0x1080113 8

0x0C

Bit 39 is 1 so ACC_ECC_EN is set correctly when bit 2 ECC_EN is 1 enabled.

In this case, nothing needs to change:

> devmem 0x1080110

0x0C000CE5

After byte swapping this is E50C 000C and this is what you use to read the manual. To make it easier, devmem with a width of 8 bits will order it per the hardware manual as illustrated above.

The RAM begins at 0x1080000. To access the phys mem register, use the offset for each register. The offset for DDR_SDRAM_CFG is 110h. Making the beginning address 0x1080110 for DDR_SDRAM_CFG register. To view the first 8 bits of the register, devmem  0x1080110 8, next 8 bits 0x1080111 8, and so forth for the 32 bit register.

Correct if any mistakes.

0 Kudos
Reply
2,917 Views
yipingwang
NXP TechSupport
NXP TechSupport

Hello Tracy Smith,

Please refer to the following procedure to trigger ECC errors.

1a. Configure DDR_SDRAM_CFG_2[D_INIT]=0b'1 in Target Initialization File.
1b. Configure DDR_SDRAM_CFG[ECC_EN]=0b'1 in Target Initialization File.
2. Set SBE threshold to 1 in the code. ERR_SBE[SBET]= 0b'1
3. Enable error detection in the code. ERR_DISABLE[MBED,SBED]= 0b'00
4. Select mirror byte function. Inject error. ERR_INJECT[EMB, EIEN]= 0b'11
5. Write one 32-bit data 0x55aa_0000 into memory location 0x5000
6. Disable error injection : ERR_INJECT[EMB, EIEN]= 0b'00.
7. Read a 32-bit data from memory location 0x5000 to trigger ECC error.


Have a great day,
TIC

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------