With different link addresses, the execution time of the same program will change. What is the relationship between them? How to ensure optimal execution of programs?
Solved! Go to Solution.
Hi, Senlent,
Thank you for your suggestion!
The characteristics of the map file are as follows. I specify the function in section(".func_mem_area"), modify the starting address of this area, and the execution time is different.
The times for specifying the same function at addresses such as 0x00001410~0x0000141F are also different.
0x00001410 [54us]
0x00001412 [75us]
0x00001414 [85us]
0x00001418 [54us]
Can you help explain the impact of alignment or address on program execution time?
The following is the map file and test function:
*(.func_mem_area)
.func_mem_area
0x00001410 0x1c ./Sources/main.o
0x00001410 DelayTest
void CODE_AREA DelayTest(uint32_t cycles)
{
uint16_t index = 0u, indey = 0u;
for(indey = 0u; indey < cycles; indey++)
{
for(index = 0u; index < 255u; index++)
{
/* do nothing. */
}
}
}
If I use SysTick for delay, it is less affected by address changes.
Looking forward to your reply.
1. Below is the delay function you set. I don’t know how you tested the execution time of this function. Is there any error?
2. This function is affected by the system clock. The clock has jitter, which will also cause the execution time to be different even for the same function.
3. This function may be affected by system interrupts, resulting in differences in execution time.
4. The function storage address is not aligned:
The Cortex-M7 core can handle unaligned accesses by hardware (but I don't know how it works). Usually, variables should be naturally aligned because these accesses are slightly faster than unaligned accesses.
The above are some of my thoughts, I can’t think of more.
OK, thanks.
Changes in the link address will cause some changes in the location where the code is stored in flash, which may lead to some differences in execution speed.I can't go into your question in depth.
The following are some optimization suggestions for improving performance of S32K3. I hope this will be helpful to you.
As following have some suggestions:
1. Most of user code allocate to P-Flash and enable I-Cache
2. Allocate system stack to D-TCM and enable D-Cache
3. Execute code frequently allocate to I-TCM. E.g., ISRs etc.
4. OS' task stack allocate to D-TCM
5. vector table allocate to D-TCM
Please note:
1. Due to enable D-Cache, other masters(E.g., DMA, HSE, another APP cores) access theses area of cacheable will be impact. So, theses area need to allocate to non-cacheable area.
2. If another master(E.g., DMA, HSE and another APP cores) access the D-TCM need to over back door. E.g., core1/DMA/HSE access core0' DTCM needed to over backdoor.