Samuel Kvasnica

stack corruption in jffs2 using SLUB (2.6.25 kernel) on cf5475

Discussion created by Samuel Kvasnica on Oct 3, 2008
Latest reply on Oct 10, 2008 by Tom Thompson
While trying to get the new 2.6.25 ltib working we came across very strange things while mounting larger
(>20MB) jffs2 partitions on NOR-Flash. Our configuration is very similar to EVB board, the main difference
is 128MB DDR and 32MB NOR-Flash. We got u-boot as well as kernel running and everything is fine when working on nfs root.

When trying to mount a jffs2 partition we've got bad page and crash deep in
jffs2/malloc.c jffs2_alloc_inode_cache() where kmem_cache_alloc() is used.
More detailed investigation shows that this is only a follow-up symptom, things get mixed-up already when calling kmalloc() in jffs2/build.c function do_mount_fs().

What is interesting, this crash occurs only if:
- SLUB allocator is used
- Partition is larger than ~20MB (w/ 128k erase block) which results to kmalloc of more than 8k (>single page)
- "compile kernel with frame pointers" is off in kernel config

Tracing down the issue we found that kmalloc() in do_mount_fs() returns a totally invalid pointer. But deeper in kmalloc code everything looks ok down to mm/page_alloc.c __get_free_pages(), only the returned pointer is mixed-up somewhere on its way. Looking at the stack content at various points shows that stack pointer is not getting back where it was before kmalloc() call, it is off by -0x10.
I attached a sample stack log with some comments to document this.

We are now really concerned about using the new kernel / slub allocator.

Could someone look deeper into this ?

Outcomes