|
Hi,
Our bandwidth tests include 8 bit / 16 bit / 32 bit / 64 bit read bandwidth test,8 bit / 16 bit / 32 bit / 64 bit memory write bandwidth test, 8 bit / 16 bit / 32 bit / 64 bit memory copy bandwidth test and memcpy(C function) bandwidth test. We have tested these under VxWorks6.9 and 7.0 OS, and u-boot firmware. The hardware configuration informations are as follows:
U-Boot 2015.01+SDKv1.9+geb3d4fc (Oct 14 2017 - 15:31:01)
CPU0: T4240, Version: 2.0, (0x82400020)
Core: e6500, Version: 2.0, (0x80400120)
Clock Configuration:
CPU0:1000 MHz, CPU1:1000 MHz, CPU2:1000 MHz, CPU3:1000 MHz,
CPU4:1000 MHz, CPU5:1000 MHz, CPU6:1000 MHz, CPU7:1000 MHz,
CPU8:1000 MHz, CPU9:1000 MHz, CPU10:1000 MHz, CPU11:1000 MHz,
CCB:700 MHz,
DDR:800 MHz (1600 MT/s data rate) (Asynchronous), IFC:43.750 MHz
FMAN1: 700 MHz
FMAN2: 700 MHz
QMAN: 350 MHz
We use the C language pointer to read,write and copy oprations. The read and write bandwidth test use the malloc C function to allocat a 128MB memory, and the memory copy test use the malloc C function to allocat two 128MB memory space. We calculate the time that it takes for reading, writing, or copying opration of the 128MB memory, and then convert it into bandwidth. The following is one of our test funtions:
UINT MemcpyBandwidthSt()
{
int i;
unsigned int gvTimeBegin,gvTimeEnd;
double spTime;
UINT64 *testMemAddr;
UINT64 *testMemAddr2;
unsigned int dataSzie = 0x8000000;
testMemAddr = malloc(dataSzie);
testMemAddr2 = malloc(dataSzie);
gvTimeBegin = sysTimestamp();
memcpy(testMemAddr,testMemAddr2,dataSzie);
gvTimeEnd = sysTimestamp();
spTime = (double)(gvTimeEnd-gvTimeBegin)/ (vxbTimestampFreq ()); /* 换算成时间: sec */
taskDelay((gvTimeBegin&0xf)*2+vxCpuIndexGet());
printf("%s: Memory Test bandwidth is %d MB/sec, run at cpu[%d] \n", __FUNCTION__,(int)( (double)dataSzie/1024.0/1024.0/spTime),vxCpuIndexGet() );
for(i=0;i<dataSzie/8;i++)
{
if( testMemAddr[i] != testMemAddr2[i])
{
printf("%s: data Compare error at %p and %p, data: 0x%lx != 0x%lx !!!!!!!!!!\n", __FUNCTION__, &testMemAddr[i],&testMemAddr2[i],testMemAddr[i],testMemAddr2[i]);
break;
}
}
free(testMemAddr);
free(testMemAddr2);
return 1;
}
The resutl is as the following:
MemcpyBandwidthSt: Memory Test bandwidth is 364 MB/sec, run at cpu[0]
The memory reading opration bandwidth and memory copying opration bandwidth are much lower than the theoretical value. The actual test results show that the 64bit memory reading bandwidth of 510~ 550MB / s, and memory copy bandwidth is about at 350~410MB/s . But the theoretical bandwidth: 800MHz*2* 64bit / 8bit = 12.8 GB / s, even at 1/4, there should be a 3.2GB /s bandwidth . The read operation bandwidth of the memory is very low , which also results in a low copy bandwidth of the memory , which affects the overall performance of the system.
As a result, we looked for an other manufacturers’s board, such as a T4240 VPX card, also found that the same situation, memory read bandwidth and copy bandwidth is low. We also looked for a NXP's T2080QDS card to test, memory read bandwidth and copy bandwidth is low too. These results are in the previous attachment.
| T4240 card 1 | T4240 | T2080QDS |
1) CPU:1.6 GHz 2) CCB:700MHz 3) DDR:800MHz | CPU:1.5GHz CCB:700MHz DDR:800MHz |
| |
MemWriteBandwidth8bitSt | 371 | 348 | 419 |
MemWriteBandwidth16bitSt | 723 | 667 | 821 |
MemWriteBandwidth32bitSt | 1375 | 1291 | 1572 |
MemWriteBandwidth64bitSt | 2499 | 2343 | 2001 |
MemReadBandwidth8bitSt | 233 | 230 | 368 |
MemReadBandwidth16bitSt | 338 | 277 | 383 |
MemReadBandwidth32bitSt | 438 | 455 | 491 |
MemReadBandwidth64bitSt | 513 | 542 | 559 |
MemCopyBandwidth8bitSt | 225 | 208 | 242 |
MemCopyBandwidth16bitSt | 316 | 295 | 321 |
MemCopyBandwidth32bitSt | 344 | 347 | 410 |
MemCopyBandwidth64bitSt | 429 | 384 | 418 |
MemcpyBandwidthSt(memcpy Bandwidth) | 410 | 358 | 406 |
We also find that memcpy or bcopy functions are used extensively in the kernel of the operating system (as the TCP/IP protocol stack). If the memory copy performance is low the performance of the whole system will be affected.
T4240 is one of NXP's highest performance QorIQ CPUs, for such a memory read and memory copy bandwidth, we have a lot of confusion about this. How can I solve it?
Actually, the most cycles are consumed due to cache-miss. So the 8-bit/16-bit performance are similar. If you could add read prefetch and write before cache zero, it will improve the performance much more.
Hello,
Thanks for your reply.
2019-08-02
tsinglin.ye
发件人: shuangjunzhu-r65879
发送时间: 2019-08-01 15:31:06
收件人: Qinglin Ye
抄送:
主题: Re: - Re: T4240 memory reading bandwidth and memory copybandwidth are very low, Why?
NXP Community
Re: T4240 memory reading bandwidth and memory copy bandwidth are very low, Why?
reply from Jefferry Zhu in T-Series - View the full discussion
Actually, the most cycles are consumed due to cache-miss. So the 8-bit/16-bit performance are similar. If you could add read prefetch and write before cache zero, it will improve the performance much more.
Reply to this message by replying to this email, or go to the message on NXP Community
Start a new discussion in T-Series by email or at NXP Community
Following Re: T4240 memory reading bandwidth and memory copy bandwidth are very low, Why? in these streams: Inbox
This email was sent by NXP Community because you are a registered user.
You may unsubscribe instantly from NXP Community, or adjust email frequency in your email preferences