hi, i am working on IMX515(CORTEX A8) processor. We have ported one image processing algorithm but it runs very slowly so i request you all please give some basic ideas about the optimization.
1. as i read cortex a8 has 13 stage pipeline . but i would like to have the pipeline information .
gcc compiler doesnt give any info regarding this.
2. I guess it has SDMA . how to implement this sdma. i would like to send the data from external memory to internal memory.
3. I tried the NEON but as my code doesnt have any serial excution so it doesnt give the good performance.
if any one has any idea regarding the above question, please reply at your convenience
1. You can find the information about cortex a8 pipeline on arm's website. But I think the pipeline should be transparent to SW.
2. Yes, it has SDMA. I think in your case, a memory-2-memory transfer should be OK. But the SDMA core works at about 60MHz, so the data rate will not be very high.
3. Even your code have no any serial execution, the NEON should give a performance improvement. Please refer http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344i/Chddgcfe.html to know how to take the advantage of NEON.
4. Make sure you code has a good cache hit rate. It is important.
5. The critical function should be written in assemble and optimized carefully.