iMX6Q PCIe speed performance with memcpy()

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

iMX6Q PCIe speed performance with memcpy()

Jump to solution
1,954 Views
wooosaiiii
Contributor III

Hello all, 

I have a two iMX6Q based boards connected via PCIe link.

The first one acts as RC and runs Linux. The second one acts as EP and runs bare-metal based on Freescale SDK.

I can transfer data from EP to RC's DDR memory via iATU outbound translation, but the transfer performance is very poor for PCIe speeds.

I use memcpy() on EP side to copy data to buffers allocated in RC's memory and then calculate speed based on elapsed CPU cycles.

Example code on EP side:

uint8_t *buffer;

buffer = malloc(DATASEND_SIZE);
if(buffer == NULL) {
     printf("\nbuffer malloc failed()\n");
     return;
}

wrStartTime = time_get_microseconds();

/* Copy data */
memcpy(dev->send.buffer, buffer, DATASEND_SIZE);

wrCurTime = time_get_microseconds();

wrTimeDiff = (uint32_t) (wrCurTime - wrStartTime);

printf("copied to buffer in %lu usec [%d MB/s]\n", wrTimeDiff,     DATASEND_SIZE/wrTimeDiff);‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The result I get:

copied to buffer in 1806 usec [18 MB/s]

Then I spend some time trying to optimize memcpy() function that comes with Freescale SDK.

I used the Linux/Android(Bionic) and NEON versions all written in assembly.

This are the results:

- NEON version of memcpy():

copied to buffer in 885 usec [37 MB/s]

- Linux version of memcpy():

copied to buffer in 908 usec [36 MB/s]

Again this is still from advertised speeds!

Then I also tested & replaced EP bare-metal OS with Linux (from NXP RC/EP validation example).

Linux based EP now copies with memcpy() to the same RC's allocated buffer!

Now the result is far better:

[ 2.063660] pcie ep: Data write speed:108MB/s.

Thus I ask myself questions what is different between implementations of EP in Linux and bare-metal SDK?

Reported link speed is the same, we basically use the same memcpy() implementation, we copy to the same RC's DDR buffers, etc...?

Could it be that because Linux uses MMU and the bare-metal doesn't?

Maybe Linux enables come caching that bare-metal doesn't?

Thanks for any answers & suggestions in advance! 

Labels (3)
1 Solution
1,200 Views
igorpadykov
NXP Employee
NXP Employee

Hi Primoz

you are right, original SDK does not use MMU and one can try patch on

Enabling MMU and Caches on i.MX6 Series Platform SDK 

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

View solution in original post

2 Replies
1,200 Views
wooosaiiii
Contributor III

igorpadykov‌ thanks for pointing me in the right direction!

I added 1:1 MMU mapping to PCIe address space in apps/common/platform_init.c! 

Now bare-metal SDK memcpy() speeds match that of Linux counterpart!

Regards,

Primoz

0 Kudos
1,201 Views
igorpadykov
NXP Employee
NXP Employee

Hi Primoz

you are right, original SDK does not use MMU and one can try patch on

Enabling MMU and Caches on i.MX6 Series Platform SDK 

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------