Memory copy with different distance between source and destination address have different performance:
Say, (pageid(src) - pageid(dst)) % 16 = M
If M == 0, the memory copy performance is the worst.
If M == 1 or M == 15, it is better.
If 2 <= M <= 14, it is best.
1) Both src and dst memory have continuous physcial page.
2) Both src and dst are page aligned.
3) I am using i.MX6Q sabrelite (seems similar behavior in i.MX6Q sabresd)
Could someone tell me why?