Memory copy with different distance between source and destination address have different performance

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Memory copy with different distance between source and destination address have different performance

958 Views
fchai
Contributor I

Memory copy with different distance between source and destination address have different performance:

Say, (pageid(src) - pageid(dst)) % 16 = M

If M == 0, the memory copy performance is the worst.

If M == 1 or M == 15, it is better.

If 2 <= M <= 14, it is best.

1) Both src and dst memory have continuous physcial page.

2) Both src and dst are page aligned.

3) I am using i.MX6Q sabrelite (seems similar behavior in i.MX6Q sabresd)

Could someone tell me why?

Thanks,

Labels (1)
0 Kudos
7 Replies

732 Views
Yuri
NXP Employee
NXP Employee

What memcpy function is used in the case ?
Are caches enabled / disbaled ?

0 Kudos

732 Views
fchai
Contributor I

Thanks for your replay.

It is simple *dst++ = *src++;

0 Kudos

732 Views
Yuri
NXP Employee
NXP Employee

   Perhaps, ARM caches influence here. Strictly speaking Your approach is not very optimal,
please refer to the following :


"What is the fastest way to copy memory on a Cortex-A8?"

ARM Information Center

Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

732 Views
fchai
Contributor I

I got similar result by disabling the MMU. So this is not related to cache.

0 Kudos

732 Views
Yuri
NXP Employee
NXP Employee

Why do not use special instructions (recommended by ARM) for memcpy ?

Your approach ( *dst++ = *src++;) - of course - is very simple, but defenitely

non-optimal.

Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos

732 Views
fchai
Contributor I

But this can not explain why the same copy (not optimized) have difference.

This is more detailed result by copy 640M memory (source memory cached):

If M == 0, finished in about 2.4 second.

If M == 1 or M == 15, finished in about 2.0 second.

If 2 <= M <= 14, finished in about 1.6 second.

I just confused by the different performance with different src/dst addresses.

Any idea?

0 Kudos

732 Views
Yuri
NXP Employee
NXP Employee

1.

It makes sense to look at assembler codes of  (*dst++ = *src++;)  and nearest code.

2.
You wrote about pageid(src) - what is pageid definition ?

3.
What DRAM (part number, Datasheet) is used in the case ?

0 Kudos