To improve i.MX6Q OpenGL rendering performance.

satoshishimoda · ‎05-18-2014

Hi community,

I have a question about i.MX6DQ OpenGL rendering with Yocto BSP (L3.10.17_1.0.0-ga).

Actually, rendering speed is not enough for 1920x1080 resolution now even though it is ok for XGA (1024x768).

Could you give me some advice to improve OpenGL rendering speed or how to investigate what is wrong?

Best Regards,

Satoshi Shimoda

Yuri · ‎05-28-2014

I found the following OpenGL ES Tips and Tricks and general CPU optimization considerations :

1. General Tips for i.MX6 :

Minimize state changes

Use uncompressed textures

Batch your calls as much as possible

Avoid glFinish.

Use Triangle Strip

Consider glDrawElements instead of glDrawArrays

Use VBOs instead of re-submitting vertices

Prevent uploads (VBOs, TexImage2D, etc.)

Optimize your shaders

reduce branching (if-else)

keep the code simple

avoid using functions

some math calls are costly

2. Other Graphics Tips and Tricks :

Keep an eye on your CPU load

In Linux, use ‘top’

Keep an eye on the Memory Bandwidth

In Linux, use ‘mmdc’ profiling tool

High CPU/BW load can be indicative of bad API usage

You should try to avoid data ‘uploads’ (either texture or vertex) on a frame-by-fame basis

Use VBOs instead of arrays in your draw calls

Use DirectVIV / PBOs / EGL Images instead of teximage data

3. CPU Optimizations.

3.1.
Before measuring any performance on ARM CPU(s), it is recommended to disable the dynamic frequency scaling:

# echo performance > /sys/devices/system/cpu/cpu0/cpufreq scaling_governor

Else, anytime your computing threads are sleeping or waiting for an interrupt,
the Power Management may enter a state with lower CPU frequency, thus degrading your performance.

3.2.

The ARM cores in i.MX6 have Neon units that can run SIMD instructions (Single Instruction Multiple Data).

Basic steps to enable Neon in gcc.

In compile flags:

LDFLAGS = -fmpu=neon –O3
Lets the compiler optimize the code using neon.

More flags: fast-math, unsafe-math, unsafe-loop-optimizations

Best practices to allow compiler better vectorize data in your loops:

Have : countable loops, independent and continuous data accesses

Ex: gather data in a struct of arrays, rather than array of structs.

Avoid: break-continue, if-else, unrolling manually loops.

Use C intrinsics

C function call interface to NEON operations

Supports all data types and operations supported by NEON

Full list http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html

More info at http://armneon.blogspot.com/

3.3.

The ARM cores in i.MX6 Dual and Quad can run your algo in Parallel by using OpenMP compile directives.
Basic steps to enable OpenMP with gcc.

In compile flags:

LDFLAGS = -fopenmp

Install on target the library libgomp.so

In source file:

#include <omp.h>

Put your code to parallelize into {}

Just before the {}, add:
#pragma omp parallel

If your code is a loop, just before the loop add:
#pragma omp parallel for

Disambigue variable visibility across threads:
shared(var1,var2,…) private(var3,var4,…)

More info at http://openmp.org/wp/resources/#Tutorials

Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

Bio_TICFSL · ‎05-26-2014

Performance depends on many factors. One thing would recommend is use eglImage Extension.

When you use conventional image and textures, it involves copy operation which will reduce the performance. I suggest you to Try to use eglImage extension, Note that the recipes in meta-browser now contain packageconfigs to enable EGL support. You don't need to pass this parameter then.

Have a great day,

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

To improve i.MX6Q OpenGL rendering performance.

To improve i.MX6Q OpenGL rendering performance.

i.MX6Dual

i.MX6Quad

Yocto Project