The cpu appears to be hung inside release_cache_debugcheck() function.
Also I found a bug in the early kernel code.
Following is an excerpt(lines 728 to 748) from kernel-source/arch/powerpc/kernel/head_64.S:
_INIT_STATIC(start_here_multiplatform)
/* set up the TOC */
bl .relative_toc
tovirt(r2,r2)
/* Clear out the BSS. It may have been done in prom_init,
* already but that's irrelevant since prom_init will soon
* be detached from the kernel completely. Besides, we need
* to clear it now for kexec-style entry.
*/
LOAD_REG_ADDR(r11,__bss_stop)
LOAD_REG_ADDR(r8,__bss_start)
sub r11,r11,r8 /* bss size */
addi r11,r11,7 /* round up to an even double word */
srdi. r11,r11,3 /* shift right by 3 */
beq 4f
addi r8,r8,-8
li r0,0
mtctr r11 /* zero this many doublewords */
3: stdu r0,8(r8)
bdnz 3b
For kernel compiled with gcc 4.9.1 I see the same addresses for __bss_stop and __bss_start as in System.map being loaded into r11 and r8 when LOAD_REG_ADDR executes.
For kernel compiled with gcc 5.2.0 the addresses being loaded are different from those in System.map. This results in a very large value(0x1fffffffffe5d357 compared with 0x1c9d0 for the other kernel) in the CTR register when "mtctr r11" instruction executes.
Also in this case if I place a breakpoint after the last instruction in above code, it never gets hit and upon suspending the execution I see the processor stuck in release_cache_debugcheck() function.