Memory copy with different distance between source and destination address have different performance:
Say, (pageid(src) - pageid(dst)) % 16 = M
If M == 0, the memory copy performance is the worst.
If M == 1 or M == 15, it is better.
If 2 <= M <= 14, it is best.
1) Both src and dst memory have continuous physcial page.
2) Both src and dst are page aligned.
3) I am using i.MX6Q sabrelite (seems similar behavior in i.MX6Q sabresd)
Could someone tell me why?
Thanks,
Perhaps, ARM caches influence here. Strictly speaking Your approach is not very optimal,
please refer to the following :
"What is the fastest way to copy memory on a Cortex-A8?"
Have a great day,
Yuri
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
Why do not use special instructions (recommended by ARM) for memcpy ?
Your approach ( *dst++ = *src++;) - of course - is very simple, but defenitely
non-optimal.
Have a great day,
Yuri
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
But this can not explain why the same copy (not optimized) have difference.
This is more detailed result by copy 640M memory (source memory cached):
If M == 0, finished in about 2.4 second.
If M == 1 or M == 15, finished in about 2.0 second.
If 2 <= M <= 14, finished in about 1.6 second.
I just confused by the different performance with different src/dst addresses.
Any idea?
1.
It makes sense to look at assembler codes of (*dst++ = *src++;) and nearest code.
2.
You wrote about pageid(src) - what is pageid definition ?
3.
What DRAM (part number, Datasheet) is used in the case ?