T4240rdb kernel hangs after loading device tree

abdurrehman · ‎10-06-2015

Hi,

I am facing an issue where kernel built with gcc 5.2.0 hangs the board after the device tree is loaded.

I used current poky and meta-fsl-ppc layers to compile the image.

Boot Log:

WARNING: adjusting available memory to 30000000

## Booting kernel from Legacy Image at 01000000 ...

Image Name: Linux-3.12.37-rt51

Image Type: PowerPC Linux Kernel Image (gzip compressed)

Data Size: 4789208 Bytes = 4.6 MiB

Load Address: 00000000

Entry Point: 00000000

Verifying Checksum ... OK

## Flattened Device Tree blob at 00e00000

Booting using the fdt blob at 0xe00000

Uncompressing Kernel Image ... OK

Loading Device Tree to 03fde000, end 03fffc40 ... OK

<hang>

Note that this is observed only with the new v5.2.0 gcc and kernel built with gcc v4.9.1 boots just fine.

It appears that a similar issue was reported and fixed for e500v2 targets a year ago.

Regards,
Abdur Rehman

abdurrehman · ‎10-19-2015

A colleague pointed a fix already available in the upstream kernel. Backporting it fixed the issue.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5e95235

View solution in original post

abdurrehman · ‎10-19-2015

A colleague pointed a fix already available in the upstream kernel. Backporting it fixed the issue.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5e95235

scottwood · ‎10-06-2015

This is the point at which the kernel receives control. One possibility is that the uncompressed kernel is larger than 14 MiB and the fdt is overwriting it -- I suggest using a higher address for the device tree (but it must be under 64 MiB). If that doesn't fix it, then use a debugger to extract the log buffer (look up the __log_buf symbol and dump 16KiB at that address) and/or see where the CPU is hung.

abdurrehman · ‎10-11-2015

The cpu appears to be hung inside release_cache_debugcheck() function.

Also I found a bug in the early kernel code.

Following is an excerpt(lines 728 to 748) from kernel-source/arch/powerpc/kernel/head_64.S:

_INIT_STATIC(start_here_multiplatform)

/* set up the TOC */

bl .relative_toc

tovirt(r2,r2)

/* Clear out the BSS. It may have been done in prom_init,

* already but that's irrelevant since prom_init will soon

* be detached from the kernel completely. Besides, we need

* to clear it now for kexec-style entry.

*/

LOAD_REG_ADDR(r11,__bss_stop)

LOAD_REG_ADDR(r8,__bss_start)

sub r11,r11,r8 /* bss size */

addi r11,r11,7 /* round up to an even double word */

srdi. r11,r11,3 /* shift right by 3 */

beq 4f

addi r8,r8,-8

li r0,0

mtctr r11 /* zero this many doublewords */

3: stdu r0,8(r8)

bdnz 3b

For kernel compiled with gcc 4.9.1 I see the same addresses for __bss_stop and __bss_start as in System.map being loaded into r11 and r8 when LOAD_REG_ADDR executes.

For kernel compiled with gcc 5.2.0 the addresses being loaded are different from those in System.map. This results in a very large value(0x1fffffffffe5d357 compared with 0x1c9d0 for the other kernel) in the CTR register when "mtctr r11" instruction executes.
Also in this case if I place a breakpoint after the last instruction in above code, it never gets hit and upon suspending the execution I see the processor stuck in release_cache_debugcheck() function.

scottwood · ‎10-12-2015

Do you see this problem if you build the latest upstream kernel with GCC 5.2? If yes, would it be possible for you to bisect GCC to find out when it broke?

adeel · ‎10-12-2015

There is a architecture specific function in powerpc tree which Kernel will execute at this point. Are you sure the kernel sources are same including the configuration?

abdurrehman · ‎10-12-2015

Positive.

I am using the same kernel source and there is no difference in the .config file in the build directory. bootargs are same too.
The only difference is the version of gcc being used to compile the kernel.

adeel · ‎10-12-2015

Why do you want to use this untested gcc 5.2.0? The Yocto SDK is not using this gcc I think. I have used 5.2 from ELDK for compiling single core 85xx targets. Perhaps give it a try with ELDK!

abdurrehman · ‎10-07-2015

Using a higher address for device tree was the first thing that I tried without luck.

__log_buf is filled with 0xdeadbeef, the magic word u-boot uses to init memory. I am working on finding out where the CPU is hung now.

Thanks for the pointers.

scottwood · ‎10-06-2015

Another option is to bisect GCC as described in the "similar issue".

T4240rdb kernel hangs after loading device tree

T4240rdb kernel hangs after loading device tree

QorIQ T4 Devices