P4080DS: High CPU Usage by C++ application

Showing results for 
Search instead for 
Did you mean: 

P4080DS: High CPU Usage by C++ application

Contributor II


I am using a P4080DS board for my C++ application. The application uses a lot of malloc and free.

From the Oprofiles report, we got to know that these memory allocation/creations and other memory related operations like memcpy, memset are taking a lot of CPU.

This drastically reduces the performance of the application.

I tried with optimizing the GCC including with -O3 optimization and certain flags while building SDK and tried to use it to compile my application.

This also did not give much improvement in the application.

Could anyone suggest on how to decrease the CPU load?

Any other suggestions, by which we can improve the performance like any libraries need to be optimized, or any configurations to be done?


Sharath Chandra

Labels (1)
0 Kudos
2 Replies

NXP TechSupport
NXP TechSupport

The compiler optimization can't reduce the number of memory allocation/creations and other memory related operations. In your case, it is reasonable to investigate profiling logs and your code in order to optimize the usage of costly operations. Generally, these C/C++ optimization rules are common for any computing platform and you can find many articles and discussions on this topic.





The last thing to do is to improve memory related operations themselves (heap manager, memcpy, strings...).

CodeWarrior runtime library:


A hand optimized version of memcpy is available in    


  It is turned OFF by default. User could manually turn it ON by


     #define USE_FAST_MEMCPY 1


  and rebuilding the runtime libraries. Fast memcpy would then be enabled only if

  the target processor support:


     - lfd/stfd instructions and floating point support is ON


     - evldd/evstd instructions


  *** Please note that there is a code size tradeoff when enabling fast memcpy as

      compared to original memcpy


Have a great day,

Note: If this post answers your question, please click the Correct Answer button. Thank you!

0 Kudos

Contributor II

Hi Pavel,

I tried profiling our application with Oprofile.  We observed the most of our CPU is being taken by _wordcopy_fwd_aligned.

CPU: e500mc, speed 1499.98 MHz (estimated)

Counted CPU_CLK events (Cycles) with a unit mask of 0x00 (No unit mask) count 750000

samples     %                     linenr info                    image name               symbol name

33455       90.7279          wordcopy.c:38               libc-2.15.so              _wordcopy_fwd_aligned

332             0.9004          memcpy.os:0                 libc-2.15.so                 L5

We even tried using lib_e500mc.so to increase the performance but of no use.

We use lot of vectors and maps. Probably internally, wordcopy may be called in STLs.

Can you suggest how to speed up this one?

Also do we have some scalable dynamic memory allocator for the board like we have on other platforms like intel ,etc.?

I am going through other links you have mentioned, will also try them.


Sharath Chandra

0 Kudos