P4080DS: High CPU Usage by C++ application

sharathchandra · ‎06-03-2015

Hi,

I am using a P4080DS board for my C++ application. The application uses a lot of malloc and free.

From the Oprofiles report, we got to know that these memory allocation/creations and other memory related operations like memcpy, memset are taking a lot of CPU.

This drastically reduces the performance of the application.

I tried with optimizing the GCC including with -O3 optimization and certain flags while building SDK and tried to use it to compile my application.

This also did not give much improvement in the application.

Could anyone suggest on how to decrease the CPU load?

Any other suggestions, by which we can improve the performance like any libraries need to be optimized, or any configurations to be done?

Thanks,

Sharath Chandra

LPP · ‎06-05-2015

The compiler optimization can't reduce the number of memory allocation/creations and other memory related operations. In your case, it is reasonable to investigate profiling logs and your code in order to optimize the usage of costly operations. Generally, these C/C++ optimization rules are common for any computing platform and you can find many articles and discussions on this topic.

http://stackoverflow.com/questions/470683/memory-allocation-deallocation-bottleneck

http://stackoverflow.com/questions/2555402/c-performance-memory-optimization-guidelines

http://www.tantalon.com/pete/cppopt/final.htm

http://en.wikibooks.org/wiki/Optimizing_C%2B%2B/Writing_efficient_code/Allocations_and_deallocations

The last thing to do is to improve memory related operations themselves (heap manager, memcpy, strings...).

CodeWarrior runtime library:

"

A hand optimized version of memcpy is available in

PowerPC_EABI_Support/Runtime/Src/__mem.c

It is turned OFF by default. User could manually turn it ON by

#define USE_FAST_MEMCPY 1

and rebuilding the runtime libraries. Fast memcpy would then be enabled only if

the target processor support:

- lfd/stfd instructions and floating point support is ON

OR

- evldd/evstd instructions

*** Please note that there is a code size tradeoff when enabling fast memcpy as

compared to original memcpy

"

Have a great day,
Pavel

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

sharathchandra · ‎06-05-2015

Hi Pavel,

I tried profiling our application with Oprofile. We observed the most of our CPU is being taken by _wordcopy_fwd_aligned.

CPU: e500mc, speed 1499.98 MHz (estimated)

Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 750000

samples % linenr info image name symbol name

33455 90.7279 wordcopy.c:38 libc-2.15.so _wordcopy_fwd_aligned

332 0.9004 memcpy.os:0 libc-2.15.so L5

We even tried using lib_e500mc.so to increase the performance but of no use.

We use lot of vectors and maps. Probably internally, wordcopy may be called in STLs.

Can you suggest how to speed up this one?

Also do we have some scalable dynamic memory allocator for the board like we have on other platforms like intel ,etc.?

I am going through other links you have mentioned, will also try them.

Thanks,

Sharath Chandra

P4080DS: High CPU Usage by C++ application

P4080DS: High CPU Usage by C++ application

QorIQ P4 Devices