I am having issues with any sort of stability on a custom board (based on SabreSD reference design) with 2GB of DDR3 SDRAM. That is, 4GB density parts X 4 chips via a single chip select (CS0). Part number: MIC MT41K256M16HA-125 IT:E (1600 speed grade) if anyone is interested.
Notable board specs:
Memory : 2GB
This issue is present on both Android JB4.3 (Based on jb4.3_1.1.0-ga) and Android KK4.4 (Based on kk4.4.3_2.0.0-beta). I have not been able to reproduce on our Yocto BSP using a kernel based on the kk4.4.3_2.0.0-beta tagged kernel (3.10.31, same as Android KK4.4 BSP). I have also been unable to reproduce this issue in 16-bit memory mode (a single DDR SDRAM chip) using the same memory part/density using the Android BSP's. Have also been unable to cause this instability using the 128M16 parts (total of 1GB DDR3 SDRAM) with the same processor.
The instability problem is as follows:
On the JB4.3 BSP, the GUI appear fluid to use, but as soon as I enter the "Settings" app, I get a memory dereference problem and the kernel crashes (See null_pointer.log attachment). I can also get the kernel to crash if I click in and out of any app quickly over several iterations (maybe up to 5). I have verified that the fb0base does not overlap with the gpu memory and have tried setting fb0base=0x27b00000 as seen on several other threads related to memory problems in the 3.0.35 Android kernel.
My virtual memory table looks like the following:
Memory policy: ECC disabled, Data cache writealloc
CPU identified as i.MX6Q, silicon rev 1.2
On node 0 totalpages: 474880
free_area_init_node: node 0, pgdat 80970460, node_mem_map 80b0e000
Normal zone: 2848 pages used for memmap
Normal zone: 0 pages reserved
Normal zone: 361440 pages, LIFO batch:31
HighMem zone: 1248 pages used for memmap
HighMem zone: 109344 pages, LIFO batch:31
PERCPU: Embedded 7 pages/cpu @81b1d000 s6592 r8192 d13888 u32768
pcpu-alloc: s6592 r8192 d13888 u32768 alloc=8*4096
pcpu-alloc:  0  1  2  3
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 470784
Kernel command line: enable_wait_mode=off console=ttymxc1,115200 vmalloc=400M consoleblank=0 video=mxcfb0:dev=hdmi,bpp=32,1280x720M@60,if=RGB24 video=mxcfb1:off video=mxcfb2:off video=mxcfb3:off androidboot.hardware=freescale androidboot.bootdev=sdhci-esdhc-imx.2 debug
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 131072 (order: 7, 524288 bytes)
Memory: 767MB 848MB 240MB = 1855MB total
Memory: 1869836k/1869836k available, 227316k reserved, 442368K highmem
Virtual kernel memory layout:
vector : 0xffff0000 - 0xffff1000 ( 4 kB)
fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB)
DMA : 0xfbe00000 - 0xffe00000 ( 64 MB)
vmalloc : 0xd9800000 - 0xf2000000 ( 392 MB)
lowmem : 0x80000000 - 0xd9000000 (1424 MB)
pkmap : 0x7fe00000 - 0x80000000 ( 2 MB)
modules : 0x7f000000 - 0x7fe00000 ( 14 MB)
.init : 0x80008000 - 0x80048000 ( 256 kB)
.text : 0x80048000 - 0x808e8318 (8833 kB)
.data : 0x808ea000 - 0x80985640 ( 622 kB)
.bss : 0x80985664 - 0x80b0d568 (1568 kB)
Preemptible hierarchical RCU implementation.
Besides trying to change the fb0base address, I have tried to change the gpu address in code via
phys = memblock_alloc_base(imx6q_gpu_pdata.reserved_mem_size, SZ_4K, SZ_2G);
I have also tried to change SZ_2G to 0x90000000, to which phys ends up getting an address of 0x85000000.
I've tried running this memory at a lower speed (800 and 1066), verified CS0_END (Tried 0x47 and others), and am finally at a loss. KK4.4 is even worse on this particular board as it never makes it to the GUI at all and instead crashes earlier. I have been focusing on JB4.3 as the issue is likely the same root cause.
One more thing I forgot to mention: I'm using a 2g/2g kernel split. I recently found this post for an older version of android using the imx53 processor, changing from a 3g/1g split required new gpulibs binaries that were compiled for the 2g/2g split (Though I don't think this matters for this new kernel/user space).
I have tried the default split of 3g/1g and found that the galcore daemon hangs and the GUI becomes unresponsive. When changing fb0base (in JB4.3) with this split to 0x27b00000, the system has the same null pointer issue as seen in the logs.
One question I have around fb0base: It shouldn't matter what this is if the splash isn't set in u-boot, is this correct?
If anyone has any extra questions, please ask me and I will get back to you as soon as I am able. Thank you!
Original Attachment has been moved to: null_pointer.log.zip