Debugging memory instability with 4.1/yocto 2.1 jethro on 6sl

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Debugging memory instability with 4.1/yocto 2.1 jethro on 6sl

1,701 Views
jayakumar2
Contributor V

Hi,

 

I'm looking for advice on debugging a memory instability with linux 4.1. I noticed that when running on the linux 4.1 kernel, the 6sl board that I am testing on is unstable and regularly crashes with kernel panics within about 30 minutes. The exact same board in the same setup is very stable when running 3.0.35.

 

By running memtester, I found that I consistently get AND test errors on 4.1. Eg:
Compare AND : ok
FAILURE: 0xedb8450c != 0xedb6450c at offset 0x025987e0.

FAILURE: 0x98f7bce2 != 0x98f5bce2 at offset 0x005a47e0.

 

I have attached complete bootup sample logs for that.

 

The same memtest run on 3.0.35 passes consistently.

 

The memory test  errors seem to be mostly single bit errors which makes me wonder if the root cause is drive strength or memory refresh configuration or pll frequency that is different in 4.1.

Comparing the boot sections shows:
working case 3.0.35:
> Memory: 1033300k/1033300k available, 15276k reserved, 0K highmem

failing case 4.1:
< Memory: 700176K/1048576K available (7178K kernel code, 396K rwdata, 2440K rodata, 412K init, 426K bss, 20720K reserved, 327680K cma-reserved, 0K highme

I didn't do any custom memory configuration for our board, ie: use default 6slevk settings. I just used the default imx6sl devicetree and u-boot settings. The 2016 u-boot code does:
int dram_init(void)
{
gd->ram_size = imx_ddr_size();

As shown by the log, that seems to correctly detect the 1GB memory.

 

I'm now working on trying to figure out if there's some ddr pll speed issue but haven't yet figured out where this is configured.

 

Any advice would be welcome!

 

Thanks!

Original Attachment has been moved to: memtest_pass_rhu_goodbrd_3.0.35.txt.zip

Original Attachment has been moved to: memtest_fail_rhu_goodbrd.txt.zip

Labels (3)
Tags (1)
0 Kudos
5 Replies

936 Views
igorpadykov
NXP Employee
NXP Employee

Hi Jaya

one can try to run ddr test and rebuild image with

ddrr configuration optimized for that specific board, as

default 6slevk settings may not suit (or have small stability margins) for it,

also 3.0.35 and 4.1 6slevk bsp settings may differ.

https://community.freescale.com/docs/DOC-105652 

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

936 Views
jayakumar2
Contributor V

Hi Igor,

Thanks for your reply. However, I found it was not related to DDR settings. Both 3.0.35 and 4.1 rely on the ddr settings setup by u-boot. In my case, I verified that the flash_header.S in older u-boot and imxconfig.cfg in 2016.07 u-boot are identical.

I found that the problem is related to CONFIG_CPU_FREQ in 4.1. In 3.0.35, cpufreq is disabled. In 4.1, disabling CONFIG_CPU_FREQ causes kernel compile failure as there is a dependency on busfreq.

arch/arm/mach-imx/Makefile
...
+obj-y += busfreq_lpddr2.o busfreq-imx.o busfreq_ddr3.o

If I manually take out that line and disable cpufreq, then my memory test passes and the board runs without problems. If I enable CPU_FREQ, then it fails memory test and crashes consistently, typically within 10 minutes.

Is CPU_FREQ known to cause problems? I would be okay with disabling cpufreq but I found that disabling cpufreq also seems to disable the usbotg interface.

Thanks!

0 Kudos

936 Views
igorpadykov
NXP Employee
NXP Employee

Hi Jaya

this may be related to DDR settings  as both 3.0.35 and 4.1 rely on

the 6slevk board ddr settings setup and may not work properly on custom board.

Finding specific ddr settings for custom board by running ddr test
is standard and recommended way.

Best regards
igor

0 Kudos

936 Views
jayakumar2
Contributor V

Hi Igor,

Thanks for your reply. I understand what you mean about running the ddr test. However, as I mentioned, I already have fully working stable ddr settings which I was using with the u-boot with 3.0.35. Since those setttings are stable, I have no intention to change them since they're not the root cause of the memory corruption. As I mentioned, I found that the root cause here is the use CPU_FREQ in 4.1. When I disable CPU_FREQ in 4.1, then the board is stable and passes memory tests.

btw, based on that result, I had asked whether CPU_FREQ is known to be a problem on other boards. If you are confident that it is not a problem, then please let me know.

Thanks!

0 Kudos

936 Views
igorpadykov
NXP Employee
NXP Employee

Hi Jaya

if board is stable and passes memory tests with disabling CPU_FREQ in 4.1 then

this just confirms assumption that board ddr settings are not optimized.

CPU_FREQ driver just adds more stressing to memory subsystem.

Another point of check may be checking power supplies ripples, should be <5%.

Best regards
igor

0 Kudos