Memory copy with different distance between source and destination address have different performance:
Say, (pageid(src) - pageid(dst)) % 16 = M
If M == 0, the memory copy performance is the worst.
If M == 1 or M == 15, it is better.
If 2 <= M <= 14, it is best.
1) Both src and dst memory have continuous physcial page.
2) Both src and dst are page aligned.
3) I am using i.MX6Q sabrelite (seems similar behavior in i.MX6Q sabresd)
Could someone tell me why?
Thanks,
What memcpy function is used in the case ?
Are caches enabled / disbaled ?
Thanks for your replay.
It is simple *dst++ = *src++;
Perhaps, ARM caches influence here. Strictly speaking Your approach is not very optimal,
please refer to the following :
"What is the fastest way to copy memory on a Cortex-A8?"
Have a great day,
Yuri
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
I got similar result by disabling the MMU. So this is not related to cache.
Why do not use special instructions (recommended by ARM) for memcpy ?
Your approach ( *dst++ = *src++;) - of course - is very simple, but defenitely
non-optimal.
Have a great day,
Yuri
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------
But this can not explain why the same copy (not optimized) have difference.
This is more detailed result by copy 640M memory (source memory cached):
If M == 0, finished in about 2.4 second.
If M == 1 or M == 15, finished in about 2.0 second.
If 2 <= M <= 14, finished in about 1.6 second.
I just confused by the different performance with different src/dst addresses.
Any idea?
1.
It makes sense to look at assembler codes of (*dst++ = *src++;) and nearest code.
2.
You wrote about pageid(src) - what is pageid definition ?
3.
What DRAM (part number, Datasheet) is used in the case ?