AnsweredAssumed Answered

iMX6Q PCIe speed performance with memcpy()

Question asked by Primoz Fiser on Sep 15, 2017
Latest reply on Sep 15, 2017 by Primoz Fiser

Hello all, 

I have a two iMX6Q based boards connected via PCIe link.

The first one acts as RC and runs Linux. The second one acts as EP and runs bare-metal based on Freescale SDK.

I can transfer data from EP to RC's DDR memory via iATU outbound translation, but the transfer performance is very poor for PCIe speeds.

 

I use memcpy() on EP side to copy data to buffers allocated in RC's memory and then calculate speed based on elapsed CPU cycles.

Example code on EP side:

uint8_t *buffer;

buffer = malloc(DATASEND_SIZE);
if(buffer == NULL) {
     printf("\nbuffer malloc failed()\n");
     return;
}

wrStartTime = time_get_microseconds();

/* Copy data */
memcpy(dev->send.buffer, buffer, DATASEND_SIZE);

wrCurTime = time_get_microseconds();

wrTimeDiff = (uint32_t) (wrCurTime - wrStartTime);

printf("copied to buffer in %lu usec [%d MB/s]\n", wrTimeDiff,     DATASEND_SIZE/wrTimeDiff);

The result I get:

copied to buffer in 1806 usec [18 MB/s]

Then I spend some time trying to optimize memcpy() function that comes with Freescale SDK.

I used the Linux/Android(Bionic) and NEON versions all written in assembly.

 

This are the results:

- NEON version of memcpy():

copied to buffer in 885 usec [37 MB/s]

- Linux version of memcpy():

copied to buffer in 908 usec [36 MB/s]

Again this is still from advertised speeds!

 

Then I also tested & replaced EP bare-metal OS with Linux (from NXP RC/EP validation example).

Linux based EP now copies with memcpy() to the same RC's allocated buffer!

Now the result is far better:

[ 2.063660] pcie ep: Data write speed:108MB/s.

 

Thus I ask myself questions what is different between implementations of EP in Linux and bare-metal SDK?

Reported link speed is the same, we basically use the same memcpy() implementation, we copy to the same RC's DDR buffers, etc...?

 

Could it be that because Linux uses MMU and the bare-metal doesn't?

Maybe Linux enables come caching that bare-metal doesn't?

 

Thanks for any answers & suggestions in advance! 

Outcomes