iMX6Q PCIe speed performance with memcpy()

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 
已解决

iMX6Q PCIe speed performance with memcpy()

跳至解决方案
3,315 次查看
wooosaiiii
Contributor IV

Hello all, 

I have a two iMX6Q based boards connected via PCIe link.

The first one acts as RC and runs Linux. The second one acts as EP and runs bare-metal based on Freescale SDK.

I can transfer data from EP to RC's DDR memory via iATU outbound translation, but the transfer performance is very poor for PCIe speeds.

I use memcpy() on EP side to copy data to buffers allocated in RC's memory and then calculate speed based on elapsed CPU cycles.

Example code on EP side:

uint8_t *buffer;

buffer = malloc(DATASEND_SIZE);
if(buffer == NULL) {
     printf("\nbuffer malloc failed()\n");
     return;
}

wrStartTime = time_get_microseconds();

/* Copy data */
memcpy(dev->send.buffer, buffer, DATASEND_SIZE);

wrCurTime = time_get_microseconds();

wrTimeDiff = (uint32_t) (wrCurTime - wrStartTime);

printf("copied to buffer in %lu usec [%d MB/s]\n", wrTimeDiff,     DATASEND_SIZE/wrTimeDiff);‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The result I get:

copied to buffer in 1806 usec [18 MB/s]

Then I spend some time trying to optimize memcpy() function that comes with Freescale SDK.

I used the Linux/Android(Bionic) and NEON versions all written in assembly.

This are the results:

- NEON version of memcpy():

copied to buffer in 885 usec [37 MB/s]

- Linux version of memcpy():

copied to buffer in 908 usec [36 MB/s]

Again this is still from advertised speeds!

Then I also tested & replaced EP bare-metal OS with Linux (from NXP RC/EP validation example).

Linux based EP now copies with memcpy() to the same RC's allocated buffer!

Now the result is far better:

[ 2.063660] pcie ep: Data write speed:108MB/s.

Thus I ask myself questions what is different between implementations of EP in Linux and bare-metal SDK?

Reported link speed is the same, we basically use the same memcpy() implementation, we copy to the same RC's DDR buffers, etc...?

Could it be that because Linux uses MMU and the bare-metal doesn't?

Maybe Linux enables come caching that bare-metal doesn't?

Thanks for any answers & suggestions in advance! 

标签 (3)
1 解答
2,561 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Primoz

you are right, original SDK does not use MMU and one can try patch on

Enabling MMU and Caches on i.MX6 Series Platform SDK 

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

在原帖中查看解决方案

2 回复数
2,561 次查看
wooosaiiii
Contributor IV

igorpadykov‌ thanks for pointing me in the right direction!

I added 1:1 MMU mapping to PCIe address space in apps/common/platform_init.c! 

Now bare-metal SDK memcpy() speeds match that of Linux counterpart!

Regards,

Primoz

0 项奖励
回复
2,562 次查看
igorpadykov
NXP Employee
NXP Employee

Hi Primoz

you are right, original SDK does not use MMU and one can try patch on

Enabling MMU and Caches on i.MX6 Series Platform SDK 

Best regards
igor
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------