LS1046a - Linux Kernel Panic (EDAC FSL_DDR MC0: Err Detect Register: 0x80000018)

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

LS1046a - Linux Kernel Panic (EDAC FSL_DDR MC0: Err Detect Register: 0x80000018)

5,217 Views
james_browning
Contributor III

Hello,

 

We are encountering the following kernel panic caused by the following errors during boot up:

 

 

[  129.454177] EDAC FSL_DDR MC0: Err Detect Register: 0x80000018
[  129.454239] SError Interrupt on CPU2, code 0xbf000002 -- SError

 

 

It seems that when we enable ARM64_ERRATUM_843419, the kernel panic goes away. This seems odd, since erratum 843419 is specifically for the A53 architecture, but the LS1046a uses the A72 architecture. Even though 843419 stops the kernel panic, it causes other software issues (specifically this http://blog.chinaunix.net/uid-13889805-id-5787750.html)

 

Here is the memory module we are using:

---=== Manufacturing Information ===---
Manufacturer                                     Fairchild
Manufacturing Location Code                      0x03
Part Number                                      NLY2G7241G071ID32Z
Revision Code                                    0x4845
Manufacturing Date                               2000-W80
Assembly Serial Number                           0xADFF0000

 

Has the above error been encountered before for LS1046a? Also, what is the relationship between erratum 843419 and the a72 processor? Should we have this option enabled?

 

Thank you for your help.

0 Kudos
Reply
5 Replies

5,149 Views
yipingwang
NXP TechSupport
NXP TechSupport

ARM64_ERRATUM_843419 is enalbed by default in arch/arm64/Kconfig for all ARM64 platforms (including Cortex-A53,  Cortex-A72, etc)

 

user can see the following options are enabled in .config CONFIG_ARM64_ERRATUM_843419=y CONFIG_ARM64_LD_HAS_FIX_ERRATUM_843419=y

 

User should not disable ARM64_ERRATUM_843419.

 

we didn't see this issue on our LS1046A boards.

0 Kudos
Reply

5,096 Views
james_browning
Contributor III

Apologies for the late reply. I have enabled 843419, however the same panic is still occurring (I was mistaken when I said the panic goes away previously). 

If my interpretation of the panic is correct, it seems like the ddr controller is getting an eec error when it tries to read from address 0x00020000. Is my understanding correct? If so, is there any suggestion for identifying where the faulty memory access is happening?

[  148.440201] EDAC FSL_DDR MC0: Captured Data / ECC:   0xffffffff_ffffffff / 0xff
[  148.447336] EDAC FSL_DDR MC0: Err addr: 0x00020000
[  148.452123] EDAC FSL_DDR MC0: PFN: 0x00000020
[  148.456479] EDAC MC0: 1 UE fsl_mc_err on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x20 offset:0x0 grain:8)
0 Kudos
Reply

5,086 Views
yipingwang
NXP TechSupport
NXP TechSupport

You could use "mtest" command provided in u-boot to do DDR memory read and write testing.

If it fails, please use QCVS DDRv tool provided in CodeWarrior to do DDR validation and optimization, then refine the DDR controller configuration parameters in ATF.

0 Kudos
Reply

5,003 Views
james_browning
Contributor III

After further testing we found that a region of ddr contains bad ECCs after a warm reboot. We feel confident our issue is the same as described in this ticket: https://community.nxp.com/t5/Layerscape/How-to-fix-ECC-of-DDR4-training-address-after-warm-boot/td-p...

We have implemented the same warm/cold boot logic as described in the above link in our u-boot as well as atf code. After a warm reboot (when we bypass memory initialization), we find that the range 0x80020000 - 0x8002007f contains bad ECCs. It seems the solution may be to disable ECC checking and then re-initialize the effected region.

Our question then is this, why is the are 0x80020000 being used for DDR training? We have DDR_INIT_ADDR and DDR_INIT_EXT_ADDR both set to 0. Is 0x80020000 just the default calibration address?

 

Please find our ddr controller dump attached

0 Kudos
Reply

4,976 Views
yipingwang
NXP TechSupport
NXP TechSupport

On LS1046A processor, DDR memory address is at 0x80000000.

I still suggest you use QCVS DDRv tool to do optimization, then refine DDR controller initialization parameters in ATF source code.

0 Kudos
Reply