We are still evaluating the performance and optimizations for the GPU driver. Yes, it is OpenGL ES 2.0 capable but that does not mean you will be playing Crysis or FEAR2 on it any time soon as you might want from a high end PC gaming chipset :) Since OpenGL ES is a - by name as well as purpose - more embedded systems OpenGL, it is not so easy to use on a Linux system where you may have possibly 3 display systems running at once (framebuffer, X.org, maybe something else..). Each one would require it's own EGL library, and there is no system in place to intelligently switch or detect which is running at any one time or to make sure it is targeting your X.org system instead of your framebuffer (i.e. the one you are currently running). In fact, the definition of the target display environment is the compile-time NativeDisplayType structure, so you can compile for one or the other, and if your installed EGL library doesn't like it, it just fails. This makes it less applicable to a full desktop environment. So, we are looking into ways of using it in more restricted scenarios. This includes a driver which runs on the framebuffer and accelerates Qt application framework which has it's own Window Manager. Michael Grunditz has posted some videos and screenshots of this work. There are also things like ChromiumOS. We are also looking into how tightly we can integrate it into the Aura firmware such that displays are managed in a way that only one EGL library would be required for any supported display system (an Aura-aware X.org driver, Aura-aware framebuffer, Aura-aware QWS would all be calling the same EGL and display framework and therefore getting a "screen" and then making a "window" on that "screen" to put an OpenGL context onto is the same for every single one). Until then you have to tread very carefully and be very mindful of the lack of integration and how much this may affect performance. The AMD Z430 OpenGL GPU and Z160 OpenVG GPU were intended for mobile phones of roughly ~800x480 display resolution so to get adequate performance you need to perform special tiling of the display to keep the data flow manageable. Michael Grunditz's Qt driver is starting to do this now. We also think that the current trend of running Linux in "softfp" ABI is really impacting potential GPU performance where operations are CPU-bound, either before or after they are processed by the GPU. This includes things like GLSL shader compilation, matrix math and other menial calculations which may be faster on the Cortex-A8 in NEON - in fact AMD and nVidia's x86 desktop drivers will often perform heavy math calculations on the CPU via SSE2 and above, in order to keep the GPU pipeline from waiting for other units to finish and increase parallelism. Konstantinos has already demonstrated in very simple benchmarks a 20% increase in FPU performance from common code just by compiling the system for "hardfp" instead of "softfp". You can find the posts on PowerDeveloper and linked on this site too. For the uninitiated the difference is in the way the compiler generates function prologs and epilogs to push and return floating-point arguments. In "softfp" mode they are stuffed into integer registers for the function call which MAY require them to be copied from FPU registers to integer ones.. and copied back out into FPU registers when they need to be used on the FPU. This could be a ~20 cycle stall on a function call into an FPU function and a ~20 cycle stall once inside the function, for each FPU argument passed. There are ways around it: pass pointers to FPU data to functions. You would have to do it this way on PPC where the load-store architecture does not support register->register copies but on ARM you really can do a register-register copy. The problem is you have to make an educated guess which will be faster (reg->reg or load data from a pointer and incur a memory read, or try to get it into the cache by preloading it) and have no idea which it is until you run it, and then the next time you try it may be different :) "hardfp" removes all this uncertainty completely and ensures all your data is not played with by the compiler or subject to sunspots, zodiac signs, cache behavior or your most recent goat sacrifice (take your pick). Anyway, I digress, all of this testing and benchmarking and evaluation takes a lot of time and we could be getting much more out of it. For now, browsing the web - as is the case with 99.99% of Google Apps not including Google Talk Voice & Video - does not require the GPU so much, so there is not much point advertising it in this particular case. Now you now the story :)I already had the oportunity to say it on the blog, but I'll say it again - good job, and kudos!
Looking at the product specs page, though, I have a question: Why aren't you more vocal about the GPU part in the OpenClient/smarbook? The iMX515's GPU is a very interesting part, and with sufficient support it could be an (extra) draw for a certain developer audience. As it is now, the product spec page does not even mention the GPU is ES2-capable. Why so shy? : )