T4240rdb kernel hangs after loading device tree

cancel
Showing results for 
Search instead for 
Did you mean: 

T4240rdb kernel hangs after loading device tree

Jump to solution
1,478 Views
Contributor III

Hi,

I am facing an issue where kernel built with gcc 5.2.0 hangs the board after the device tree is loaded.

I used current poky and meta-fsl-ppc layers to compile the image.

Boot Log:

WARNING: adjusting available memory to 30000000

## Booting kernel from Legacy Image at 01000000 ...

   Image Name:   Linux-3.12.37-rt51

   Image Type:   PowerPC Linux Kernel Image (gzip compressed)

   Data Size:    4789208 Bytes = 4.6 MiB

   Load Address: 00000000

   Entry Point:  00000000

   Verifying Checksum ... OK

## Flattened Device Tree blob at 00e00000

   Booting using the fdt blob at 0xe00000

   Uncompressing Kernel Image ... OK

   Loading Device Tree to 03fde000, end 03fffc40 ... OK

<hang>

Note that this is observed only with the new v5.2.0 gcc and kernel built with gcc v4.9.1 boots just fine.

It appears that a similar issue was reported and fixed for e500v2 targets a year ago.

Regards,
Abdur Rehman

Labels (1)
Tags (1)
0 Kudos
1 Solution
127 Views
Contributor III

A colleague pointed a fix already available in the upstream kernel. Backporting it fixed the issue.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5e95235

View solution in original post

0 Kudos
9 Replies
128 Views
Contributor III

A colleague pointed a fix already available in the upstream kernel. Backporting it fixed the issue.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5e95235

View solution in original post

0 Kudos
127 Views
NXP Employee
NXP Employee

This is the point at which the kernel receives control.  One possibility is that the uncompressed kernel is larger than 14 MiB and the fdt is overwriting it -- I suggest using a higher address for the device tree (but it must be under 64 MiB).  If that doesn't fix it, then use a debugger to extract the log buffer (look up the __log_buf symbol and dump 16KiB at that address) and/or see where the CPU is hung.

127 Views
Contributor III

The cpu appears to be hung inside release_cache_debugcheck() function.

Also I found a bug in the early kernel code.

Following is an excerpt(lines 728 to 748) from kernel-source/arch/powerpc/kernel/head_64.S:

_INIT_STATIC(start_here_multiplatform)

/* set up the TOC */

bl .relative_toc

tovirt(r2,r2)

/* Clear out the BSS. It may have been done in prom_init,

* already but that's irrelevant since prom_init will soon

* be detached from the kernel completely. Besides, we need

* to clear it now for kexec-style entry.

*/

LOAD_REG_ADDR(r11,__bss_stop)

LOAD_REG_ADDR(r8,__bss_start)

sub r11,r11,r8 /* bss size */

addi r11,r11,7 /* round up to an even double word */

srdi. r11,r11,3 /* shift right by 3 */

beq 4f

addi r8,r8,-8

li r0,0

mtctr r11 /* zero this many doublewords */

3: stdu r0,8(r8)

bdnz 3b

For kernel compiled with gcc 4.9.1 I see the same addresses for __bss_stop and __bss_start as in System.map being loaded into r11 and r8 when LOAD_REG_ADDR executes.

For kernel compiled with gcc 5.2.0 the addresses being loaded are different from those in System.map. This results in a very large value(0x1fffffffffe5d357 compared with 0x1c9d0 for the other kernel) in the CTR register when "mtctr r11" instruction executes.
Also in this case if I place a breakpoint after the last instruction in above code, it never gets hit and upon suspending the execution I see the processor stuck in release_cache_debugcheck() function.

0 Kudos
127 Views
NXP Employee
NXP Employee

Do you see this problem if you build the latest upstream kernel with GCC 5.2?  If yes, would it be possible for you to bisect GCC to find out when it broke?

0 Kudos
127 Views
Contributor III

There is a architecture specific function in powerpc tree which Kernel will execute at this point. Are you sure the kernel sources are same including the configuration?

0 Kudos
127 Views
Contributor III

Positive.

I am using the same kernel source and there is no difference in the .config file in the build directory. bootargs are same too.
The only difference is the version of gcc being used to compile the kernel.

0 Kudos
127 Views
Contributor III

Why do you want to use this untested gcc 5.2.0? The Yocto SDK is not using this gcc I think.  I have used 5.2 from ELDK for compiling single core 85xx targets. Perhaps give it a try with ELDK!

0 Kudos
127 Views
Contributor III

Using a higher address for device tree was the first thing that I tried without luck.

__log_buf is filled with 0xdeadbeef, the magic word u-boot uses to init memory. I am working on finding out where the CPU is hung now.

Thanks for the pointers.

0 Kudos
127 Views
NXP Employee
NXP Employee

Another option is to bisect GCC as described in the "similar issue".