EDAC L1 and L2 cache error detection and correction on LS1043

Document created by Yiping Wang Employee on Oct 10, 2019
Version 1Show Document
  • View in full screen mode

ARM Cortex A57 and A53 L1/L2 cache error reporting

The attached patch adds error detection for A53 and A57 cores. Hardware error injection is supported on A53. Software error injection is supported on both.

For hardware error injection on A53 to work, proper access to L2ACTLR_EL1, CPUACTLR_EL1 needs to be granted by EL3 firmware. This is done by making an SMC call in the driver. Failure to enable access disables hardware error injection. For error interrupt to work, another SMC call enables access to L2ECTLR_EL1. Failure to enable access disables interrupt for error reporting.

 

CPU Memory Error Syndrome and L2 Memory Error Syndrome registers can be used for checking L1 and L2 memory errors. However, only A53 supports double-bit error injection to L1 and L2 memory. This driver uses the hardware error injection when available, but also provides a way to inject errors by software. Both A53 and A57 supports interrupt when multi-bit errors happen.

 

To use hardware error injection and the interrupt, proper access needs to be granted in ACTLR_EL3 (and/or ACTLR_EL2) register by EL3 firmware SMC call.

Correctable errors do not trigger such interrupt. This driver uses dynamic polling internal to check for errors. The more errors detected, the more frequently it polls. Combining with interrupt, this driver can detect correctable and uncorrectable errors. However, if the uncorrectable errors cause system abort exception, this driver is not able to report errors in time.

 

 

Building PPA Image
Please make sure the PPA source which you are using includes commit 781d7b513c2b44e7, PPA source code in LSDK later than1809, which includes this commit, so you could use PPA image provided in LSDK 1809 or the later release.
In addition, You need to enable "dbg" when building PPA.
If you use LSDK build environment, please add "dbg" in ppa build command in packages/firmware/Makefile as the following, then rebuild ppa with command "$ flex-builder -c ppa -m ls1043ardb" and get ppa image in build/firmware/ppa/soc-ls1043/ppa.itb.
socname=`echo $(MACHINE)|tr -cd [:digit:]` && \
./build rdb-fit dbg ls$$socname && cp soc-ls$$socname/build/obj/ppa.itb $(FBDIR)/build/firmware/ppa/soc-ls$$socname && \


If you want build PPA manually with the standalone Toolchain, you could build PPA image with the command "./build prod rdb-fit dbg ls1043".
Please deploy PPA image at 0x60400000 on NOR flash(the current bank).

 

L1/L2 cache error detection and correction EDAC feature verification

 

NXP LSDK 19.03 devel

localhost login: root

Password:

Last login: Fri May 17 00:11:26 UTC 2019 on ttyS0

Welcome to NXP LSDK 19.03 devel (GNU/Linux 4.14.16-dirty aarch64)

 

 * Support:        https://www.nxp.com/lsdk

 * Documentation:  https://lsdk.github.io

setting 31 rows and 111 columns

root@localhost:~# ls /sys/devices/system/edac/cpu_cache/

cpu_cache0  l1_ce_sw_inject  l1_ue_sw_inject  l2_ue_hw_inject  log_ce  panic_on_ue

device      l1_ue_hw_inject  l2_ce_sw_inject  l2_ue_sw_inject  log_ue  poll_msec

 

root@localhost:~#  echo 1 > /sys/devices/system/edac/cpu_cache/l1_ue_sw_inject

root@localhost:~# [   74.831916] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 0

[   74.831916] '

echo 1 > /sys/devices/system/edac/cpu_cache/l1_ue_hw_inject^C

root@localhost:~# echo 1 > /sys/devices/system/edac/cpu_cache/l1_ue_hw_inject

[   80.192338] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 1

[   80.192338] '

root@localhost:~#

root@localhost:~#

root@localhost:~# echo 1 > /sys/devices/system/edac/cpu_cache/l1_ue_sw_inject

root@localhost:~# [  109.647966] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 0

[  109.647966] '

[  109.659012] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 1

[  109.659012] '

 

root@localhost:~# echo 1 > /sys/devices/system/edac/cpu_cache/l1_ue_sw_inject

root@localhost:~# [  136.271980] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 0

[  136.271980] '

[  136.283027] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 1

[  136.283027] '

 

root@localhost:~# echo 1 > /sys/devices/system/edac/cpu_cache/l1_ue_hw_inject

[  157.352298] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 2

[  157.352298] '

root@localhost:~# echo 1 > /sys/devices/system/edac/cpu_cache/l2_ue_sw_inject

root@localhost:~# [  183.375947] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L2 'Fatal error(s) on CPU 0

[  183.375947] '

 

root@localhost:~#  echo 1 > /sys/devices/system/edac/cpu_cache/l2_ue_hw_inject

[  186.928679] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L2 'Fatal error(s) on CPU 0

[  186.928679] '

root@localhost:~# [  191.055944] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L2 'Fatal error(s) on CPU 0

[  191.055944] '

 

root@localhost:~# dmesg | grep EDAC

[    1.819380] EDAC MC: Ver: 3.0.0

[    3.423554] EDAC DEVICE0: Giving out device to module edac-a53 controller cortex_edac_l1_l2: DEV edac-a53 (POLLED)

[   74.831898] EDAC cortex_edac_l1_l2: CPU 0 L1-I Tag RAM error(s) detected

[   74.831910] EDAC cortex_edac_l1_l2: CPU 0 L1 fatal error(s) detected (0x8000000080000000)

[   74.831916] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 0

[   80.192307] EDAC cortex_edac_l1_l2: CPU 1 L1-D Data RAM error(s) detected

[   80.192326] EDAC cortex_edac_l1_l2: CPU 1 L1 fatal error(s) detected (0x8000000089000002)

[   80.192338] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 1

[  109.647919] EDAC cortex_edac_l1_l2: CPU 0 L1-I Tag RAM error(s) detected

[  109.647938] EDAC cortex_edac_l1_l2: CPU 0 L1 fatal error(s) detected (0x8000000080000000)

[  109.647942] EDAC cortex_edac_l1_l2: CPU 1 L1-I Tag RAM error(s) detected

[  109.647955] EDAC cortex_edac_l1_l2: CPU 1 L1 fatal error(s) detected (0x8000000080000000)

[  109.647966] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 0

[  109.659012] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 1

[  136.271933] EDAC cortex_edac_l1_l2: CPU 0 L1-I Tag RAM error(s) detected

[  136.271951] EDAC cortex_edac_l1_l2: CPU 0 L1 fatal error(s) detected (0x8000000080000000)

[  136.271956] EDAC cortex_edac_l1_l2: CPU 1 L1-I Tag RAM error(s) detected

[  136.271969] EDAC cortex_edac_l1_l2: CPU 1 L1 fatal error(s) detected (0x8000000080000000)

[  136.271980] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 0

[  136.283027] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 1

[  157.352268] EDAC cortex_edac_l1_l2: CPU 2 L1-D Data RAM error(s) detected

[  157.352287] EDAC cortex_edac_l1_l2: CPU 2 L1 fatal error(s) detected (0x8000000089180002)

[  157.352298] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L1 'Fatal error(s) on CPU 2

[  183.375931] EDAC cortex_edac_l1_l2: CPU 0 L2 fatal error(s) detected (0x8000000080000000)

[  183.375947] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L2 'Fatal error(s) on CPU 0

[  186.928662] EDAC cortex_edac_l1_l2: CPU 0 L2 fatal error(s) detected (0x80000000910c4058)

[  186.928679] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L2 'Fatal error(s) on CPU 0

[  191.055928] EDAC cortex_edac_l1_l2: CPU 0 L2 fatal error(s) detected (0x80000000910d0868)

[  191.055944] EDAC DEVICE0: UE: cortex_edac_l1_l2 instance: cpu_cache0 block: L2 'Fatal error(s) on CPU 0

root@localhost:~#

Outcomes