Hi,
I am using a P4080DS board for my C++ application. The application uses a lot of malloc and free.
From the Oprofiles report, we got to know that these memory allocation/creations and other memory related operations like memcpy, memset are taking a lot of CPU.
This drastically reduces the performance of the application.
I tried with optimizing the GCC including with -O3 optimization and certain flags while building SDK and tried to use it to compile my application.
This also did not give much improvement in the application.
Could anyone suggest on how to decrease the CPU load?
Any other suggestions, by which we can improve the performance like any libraries need to be optimized, or any configurations to be done?
Thanks,
Sharath Chandra
The compiler optimization can't reduce the number of memory allocation/creations and other memory related operations. In your case, it is reasonable to investigate profiling logs and your code in order to optimize the usage of costly operations. Generally, these C/C++ optimization rules are common for any computing platform and you can find many articles and discussions on this topic.
http://stackoverflow.com/questions/470683/memory-allocation-deallocation-bottleneck
http://stackoverflow.com/questions/2555402/c-performance-memory-optimization-guidelines
http://www.tantalon.com/pete/cppopt/final.htm
http://en.wikibooks.org/wiki/Optimizing_C%2B%2B/Writing_efficient_code/Allocations_and_deallocations
The last thing to do is to improve memory related operations themselves (heap manager, memcpy, strings...).
CodeWarrior runtime library:
"
A hand optimized version of memcpy is available in
PowerPC_EABI_Support/Runtime/Src/__mem.c
It is turned OFF by default. User could manually turn it ON by
#define USE_FAST_MEMCPY 1
and rebuilding the runtime libraries. Fast memcpy would then be enabled only if
the target processor support:
- lfd/stfd instructions and floating point support is ON
OR
- evldd/evstd instructions
*** Please note that there is a code size tradeoff when enabling fast memcpy as
compared to original memcpy
"
Have a great day,
Pavel
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Hi Pavel,
I tried profiling our application with Oprofile. We observed the most of our CPU is being taken by _wordcopy_fwd_aligned.
CPU: e500mc, speed 1499.98 MHz (estimated)
Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 750000
samples % linenr info image name symbol name
33455 90.7279 wordcopy.c:38 libc-2.15.so _wordcopy_fwd_aligned
332 0.9004 memcpy.os:0 libc-2.15.so L5
We even tried using lib_e500mc.so to increase the performance but of no use.
We use lot of vectors and maps. Probably internally, wordcopy may be called in STLs.
Can you suggest how to speed up this one?
Also do we have some scalable dynamic memory allocator for the board like we have on other platforms like intel ,etc.?
I am going through other links you have mentioned, will also try them.
Thanks,
Sharath Chandra