AnsweredAssumed Answered

Specific Android 3D operations seem to corrupt the memory

Question asked by Frank Burgdorf on Mar 12, 2015
Latest reply on Apr 24, 2015 by Pushpal Sidhu

We are experiencing memory corruption on i.MX6 Dual (not Lite) with Android 4.3 when using the 3D GPU. Random memory locations are overwritten, mostly with zeroes. The affected locations vary from user memory (resulting in application crashes), kernel data, including page tables (resulting in Oopses in various locations) to kernel code (resulting in illegal instruction traps in internal kernel functions), but also occasionally in framebuffer memory, visible as black pixels.

 

Hardware is our custom board based on i.MX6 Dual, silicon rev 1.3 (internal revision 5). Memory layout is 1GB of DDR3_x64 memory. We have a 24 Bit LCD with 800 x 480 pixel resolution attached to the parallel LCD interface (no HDMI or LVDS). We have already run the DDR stress test tool[1] from Freescale and applied the resulting timing parameters with no visible change.

 

The issue can be reproduced with our custom Android distribution based on 4.3 with kernel version 3.16 from kernel.org and also with kernel version 3.10.53 from Freescale (git tag kk4.4.3_2.0.0-ga).

 

Running the same software on a i.MX6Quad SABRE-SD board (i.MX6 Quad with silicon rev. 1.1) does NOT produce the issue. Running the same software on a Wand board Quad silicon rev. 1.2 (www.wandboard.org) also produces the issue. Even running the original Wand board image "android-4.4.2-wandboard-2014 0815" generates the problem.

 

How to trigger the problem:

The issue can be triggered by repeatedly starting the built-in Android web browser and rendering the Google homepage[2]. Other 3D rendering operations can also trigger the issue.

 

Disabling the hardware accelerated UI rendering in Android (build variable USE_OPENGL_RENDERER) prevents the issue. This is however not a viable solution because it makes the UI sluggish and the issue might also be triggered by other (still unknown) operations.

For a test we have applied a kernel patch which marks the kernel code section read-only in the MMU, so accesses by the CPU are trapped and result in a kernel error. The issue still persists (we see illegal instruction traps), so these writes seem not to be triggered by the CPU, but by another SoC engine capable of memory writes, like the 3D GPU.

 

We have spent a lot of time already into this issue reviewing the hardware, interfaces and power supply. As this problem can be reproduced on the Wand board also, there seems to be some independence from our specific hardware design.

The Wand board does also use the parallel LCD interface. The Sabre is using HDMI. So we disabled the physical Display interface and were running tests doing memory check sums after 3D operations. The problem was still there.

Maybe the i.MX silicon rev. has some effect. We currently do not have any rev 1.1 chip at hand, so we can not make that check on our hardware.

Any suggestions are helpful.

 

[1] https://community.freescale.com/docs/DOC-96412

[2] https://google.com

Outcomes