I am debugging a long standing application instability, and I believe I have narrowed it down to a kernel fault.
On a SABRELite board with the 3.0.35 kernel, I am running the attached stress test. What it does is create four threads (one per CPU core) each allocating and freeing blocks of memory at high rate in random order. The test will fail with a glibc memory corruption error within some time (a few seconds to a few hours).
To reproduce, download the attached source file. Build using a GCC cross compiler (version is unimportant) with this command:
<cross-compiler-prefix>g++ -O3 test_heap.cpp -lpthread -o test_heap
Execution should look like this after 10 seconds, with more logs being printed each additional 10 seconds:
Memory allocation stress test running on 4 threads.
Thread 3  - 247184 allocs 246427 frees. Alloced blocks 757.
Thread 0  - 247973 allocs 247281 frees. Alloced blocks 692.
Thread 1  - 248405 allocs 247546 frees. Alloced blocks 859.
Thread 2  - 248217 allocs 247343 frees. Alloced blocks 874.
The problem does not occur on Intel desktops (or else I'd be discussing this on a more general forum) and does not happen under a more advanced kernel, 3.10.17. Unfortunately however at this point in our product's life cycle it would be very difficult to port our product to the newer kernel, and I believe there are many others in the same position, so I am seeking help (from Freescale or the community) in finding the cause of the bug and, ideally, a patch to fix 3.0.35. I have searched the kernel changelog between 3.0.35 and 3.10.17 but found nothing relevant.
Thanks for any ideas,
Original Attachment has been moved to: test_heap.cpp.zip