[LS1043A] Custom board bring-up: mtest question

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

[LS1043A] Custom board bring-up: mtest question

Jump to solution
3,590 Views
mike_palmer
Contributor II

- custom LS1043A board with 2GB DDR3L arranged as two (2) 8Gb (512Mx16) devices on CS0

- U-boot version:

 

=> version
U-Boot 2019.10-dirty (Dec 12 2020 - 22:30:25 -0500)

aarch64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
GNU ld (GNU Binutils for Ubuntu) 2.34

 

We are using the LSDK 20.04 and the LS1043ARDB reference design as a base.

@ufedor  kindly assisted us with the DDR3L register configuration.  Our boot sequence terminates now with:

 

[    2.980395] Unable to handle kernel paging request at virtual address fffffffffffffffe
[    2.988305] Mem abort info:
[    2.991092]   ESR = 0x96000004
[    2.994140]   EC = 0x25: DABT (current EL), IL = 32 bits
[    2.999444]   SET = 0, FnV = 0
[    3.002491]   EA = 0, S1PTW = 0
[    3.005624] Data abort info:
[    3.008494]   ISV = 0, ISS = 0x00000004
[    3.012322]   CM = 0, WnR = 0
[    3.015282] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000082b60000
[    3.021976] [fffffffffffffffe] pgd=0000000000000000
[    3.026850] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[    3.032410] Modules linked in:
[    3.035457] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.4.3 #1
[    3.041278] Hardware name: LS1043A RDB Board (DT)
[    3.045971] pstate: 40000005 (nZcv daif -PAN -UAO)
[    3.050759] pc : mac_probe+0x34c/0x734
[    3.054498] lr : mac_probe+0x32c/0x734
[    3.058234] sp : ffff80001003baf0
[    3.061537] x29: ffff80001003baf0 x28: 0000000000000007
[    3.066838] x27: ffffbdf985655068 x26: ffff000061f9dc00
[    3.072139] x25: ffffbdf985b39000 x24: ffffbdf984f1a000
[    3.077441] x23: ffff00007b62c3e8 x22: ffff00007b6327e0
[    3.082742] x21: ffff000061f9dc10 x20: 0000000000000001
[    3.088042] x19: ffff00006111f080 x18: ffffbdf9845284e8
[    3.093344] x17: ffffbdf9845285b8 x16: ffffbdf98455fdf8
[    3.098644] x15: 0000000001ae5000 x14: ffffffffff000000
[    3.103945] x13: ffffbdf985454000 x12: ffff8000101e4000
[    3.109246] x11: 000000000000000b x10: 0101010101010101
[    3.114547] x9 : fffffffffffffffb x8 : 7f7f7f7f7f7f7f7f
[    3.118713] ata1: SATA link down (SStatus 0 SControl 300)
[    3.119849] x7 : fefefeff646c606d x6 : 0000000000000001
[    3.130530] x5 : 0000000000000004 x4 : 0000000000000003
[    3.135831] x3 : ffff000061f9dc48 x2 : 9c4d28f52ce4dc00
[    3.141132] x1 : 0000000000000000 x0 : fffffffffffffffe
[    3.146433] Call trace:
[    3.148870]  mac_probe+0x34c/0x734
[    3.152263]  platform_drv_probe+0x50/0xa0
[    3.156265]  really_probe+0x108/0x348
[    3.159917]  driver_probe_device+0x58/0x100
[    3.164090]  device_driver_attach+0x6c/0x90
[    3.168262]  __driver_attach+0x84/0xc8
[    3.172001]  bus_for_each_dev+0x74/0xc8
[    3.175826]  driver_attach+0x20/0x28
[    3.179391]  bus_add_driver+0x148/0x1f0
[    3.183216]  driver_register+0x60/0x110
[    3.187041]  __platform_driver_register+0x40/0x48
[    3.191737]  mac_load+0x30/0x6c
[    3.194870]  do_one_initcall+0x5c/0x1b0
[    3.198696]  kernel_init_freeable+0x1a4/0x24c
[    3.203043]  kernel_init+0x10/0x108
[    3.206522]  ret_from_fork+0x10/0x18
[    3.210090] Code: 91008021 128002b4 97f6b44a 140000e9 (b9400001)
[    3.216181] ---[ end trace b507af7b696ba967 ]---
[    3.220821] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    3.228467] SMP: stopping secondary CPUs
[    3.232382] Kernel Offset: 0x3df973a00000 from 0xffff800010000000
[    3.238462] PHYS_OFFSET: 0xfffffa38c0000000
[    3.242633] CPU features: 0x0002,20802004
[    3.246631] Memory Limit: none
[    3.249676] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

 

After doing some research and seeing @yipingwang recommend enabling mtest in u-boot for similar failures:

https://community.nxp.com/t5/Layerscape/Unable-to-handle-kernel-paging-request-at-virtual-address/m-...

I did that and found that I am able to run the test successfully over the breadth of the DDR except for ~139MB at the top of memory:

 

=> mtest 80000000 f7a23f67 5a5a5a5a 1
Testing 80000000 ... f7a23f67:
Pattern 5A5A5A5A  Writing...  Reading...Tested 1 iteration(s) with 0 errors.
=> mtest 80000000 f7a23f68 5a5a5a5a 1
Testing 80000000 ... f7a23f68:
Pattern 5A5A5A5A  Writing...  Reading...
Mem error @ 0xF7A23F60: found F7A23F80, expected 694EA246
Tested 1 iteration(s) with 1 errors.

The cs[0].bnds register is set to 0x7F as per @ufedor and the LS1043ARDB recommendations.

Questions:

1) Is the address at which these errors occur expected?

2) Could these errors account for the "kernel paging request" error and subsequent kernel panic?

 

0 Kudos
Reply
1 Solution
3,518 Views
mike_palmer
Contributor II

For closure: I was able to write a simple, full DDR test consisting of patterns and walking 0s and 1s while in BL2 and it passed without error. I also enabled the BIST test after DDR register initialization and that completes without error as well. It appears that mtest does not provide full coverage of DDR while running out of that memory.

View solution in original post

0 Kudos
Reply
7 Replies
3,519 Views
mike_palmer
Contributor II

For closure: I was able to write a simple, full DDR test consisting of patterns and walking 0s and 1s while in BL2 and it passed without error. I also enabled the BIST test after DDR register initialization and that completes without error as well. It appears that mtest does not provide full coverage of DDR while running out of that memory.

0 Kudos
Reply
3,545 Views
mike_palmer
Contributor II

I ran bdinfo and get the following:

=> bdinfo
arch_number = 0x0000000000000000
boot_params = 0x0000000000000000
DRAM bank = 0x0000000000000000
-> start = 0x0000000080000000
-> size = 0x000000007be00000
eth0name = FM1@DTSEC3
ethaddr = 00:11:22:33:44:55
eth1name = FM1@DTSEC4
eth1addr = 01:11:22:33:44:55
current eth = FM1@DTSEC3
ip_addr = <NULL>
baudrate = 115200 bps
TLB addr = 0x00000000f7bf0000
relocaddr = 0x00000000f7b2b000
reloc off = 0x0000000075b2b000
irq_sp = 0x00000000f7a24290
sp start = 0x00000000f7a24290
Early malloc usage: 338 / 2000
fdt_blob = 0x00000000f7a242a0

My failures starting at 0xF7A23F68 is pretty close to memory being used for sp_start, irq_sp and fdt_blob. Is mtest "allowed" to write to areas at the top of DDR where u-boot has relocated or will attempting to do so return expected results or cause a crash?

0 Kudos
Reply
3,562 Views
jeremy_sauget
Contributor I

Hi mike_palmer,
I am seeing the exact same problem as you on my custom ls1043a board (data abort on mac_probe function) . However, when usng a previous version of the lsdk (17.03), everything is working fine. It seems that it is coming from Linux image as when using images built from LSDK 20 but Linux image built with LSDK 17.03, it is still working.

0 Kudos
Reply
3,566 Views
mike_palmer
Contributor II

Thank you for your guidance @Pavel, we will look into this. Thanks also for the tip re CW providing 15-day functionality.

0 Kudos
Reply
3,570 Views
Pavel
NXP Employee
NXP Employee

Memory test in u-boot is simple memory test.

If this test does not pass for full memory, memory on your board is incorrect.

Usually Linux detects a problem if memory is incorrect.

NXP offers QCVS validation memory tools for DDR:

https://www.nxp.com/docs/en/user-guide/QCVS_DDR_User_Guide.pdf

Use this tool for testing DDR on your board.

This tools is availbale in CodeWarrior:

https://nxp.flexnetoperations.com/control/frse/product?entitlementId=529066037&lineNum=1&authContact...

 

Evaluation version of CodeWarrior 11.5.0 provides full functionality for 15 days.

 

See also the AN5279:

https://www.nxp.com/docs/en/application-note/AN5279.pdf

0 Kudos
Reply
3,210 Views
DN31415
Contributor II

@mike_palmerDid you find the root cause of the kernel panic in mac_probe? If I read this correctly, you suspected an SDRAM issue and had some struggles testing your memory, but eventually were able to test it, leaving the kernel panic unexplained. We have the same issue on a custom LS1046A board, and like you we are doing some memory testing, but so far it appears to be working as expected, so we're not sure what's causing the issue in mac_probe. Thanks!

0 Kudos
Reply
3,194 Views
mike_palmer
Contributor II

Hi @DN31415 ,

Going back through my notes and logs at the time it appears that our mac-probe problem was tied to uninitialized ethXaddr environment variables. From my notes:

...if the MAC addresses are not assigned/NULL it causes a panic. I think the kernel normally reads the MACs out of an EEPROM on the board but ours is not initialized.

So I used u-boot to:

=> setenv ethaddr 00:40:42:04:99:92
=> setenv eth1addr 00:40:42:04:99:93
=> setenv eth2addr 00:40:42:04:99:94
=> setenv eth3addr 00:40:42:04:99:95
=> setenv eth4addr 00:40:42:04:99:96
=> setenv eth5addr 00:40:42:04:99:97
=> setenv eth6addr 00:40:42:04:99:98
=> saveenv

... I then reset the board and allowed the kernel to start and it breezed past the MAC problem...

I ended up having another kernel panic immediately after but it was easily identified as being caused by a rootfs problem on my SD card.

Hope this helps.