T4240 memory reading bandwidth and memory copy bandwidth are very low, Why?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

T4240 memory reading bandwidth and memory copy bandwidth are very low, Why?

36,328 Views
qinglinye
Contributor I

Dear,

We developed a board based on NXP’s QorIQ T4240 CPU and running VxWorks7.0 operating system that booted from U-boot. We tested the memory reading and writing performance of the card and found that the performance of memory read and copy is very low, as follows:

MB/sec T4240 card 1 T4240 card 2 T2080QDS
MemWriteBandwidth8bitSt 371 348 419
MemWriteBandwidth16bitSt 723 667 821
MemWriteBandwidth32bitSt 1375 1291 1572
MemWriteBandwidth64bitSt 2499 2343 2001
MemReadBandwidth8bitSt 233 230 368 ( !!!!!!!!!!! )
MemReadBandwidth16bitSt 338 277 383 ( !!!!!!!!!!! )
MemReadBandwidth32bitSt 438 455 491 ( !!!!!!!!!!! )
MemReadBandwidth64bitSt 513 542 559 ( !!!!!!!!!!! )
MemCopyBandwidth8bitSt 225 208 242 ( !!!!!!!!!!! )
MemCopyBandwidth16bitSt 316 295 321 ( !!!!!!!!!!! )
MemCopyBandwidth32bitSt 344 347 410 ( !!!!!!!!!!! )
MemCopyBandwidth64bitSt 429 384 418 ( !!!!!!!!!!! )
MemcpyBandwidthSt 410 358 406 ( !!!!!!!!!!! )


We think that if 8-bit memory read bandwidth is 230 MB / s, then 16-bit memory read bandwidth is about 460 MB / s. 32 bit memory read bandwidth is about 920 MB/s, 32 bit memory read bandwidth is about 1840 MB / s.

Moreover, low memory read bandwidth directly leads to low memory copy (memcpy and bcopy) bandwidth, which seriously affects the performance of the system.

My questions are:
1) Memory write operations grow at a rate of 2 times , but why is the rate at which memory read or copy is not twice the rate of growth ? Could you tell me why the T4240 CPU has the low read bandwidth and copy bandwidth?
2) Is there any solution?

Please see the attached files for more informations.

Remarks:
1, we used two different manufacturers of T4240 processor boards, which found that the memory read bandwidth and the memory copy bandwidth were very low.
2, we also performed the same test on the T2080QDS board, and found that the memory read bandwidth and the memory copy bandwidth were very low too.
3,Hardware configuration
U-Boot 2015.01+SDKv1.9+geb3d4fc (Oct 14 2017 - 15:31:01)
CPU0: T4240, Version: 2.0, (0x82400020)
Core: e6500, Version: 2.0, (0x80400120)
Clock Configuration:
CPU0:1000 MHz, CPU1:1000 MHz, CPU2:1000 MHz, CPU3:1000 MHz,
CPU4:1000 MHz, CPU5:1000 MHz, CPU6:1000 MHz, CPU7:1000 MHz,
CPU8:1000 MHz, CPU9:1000 MHz, CPU10:1000 MHz, CPU11:1000 MHz,
CCB:700 MHz,
DDR:800 MHz (1600 MT/s data rate) (Asynchronous), IFC:43.750 MHz
FMAN1: 700 MHz
FMAN2: 700 MHz
QMAN: 350 MHz
PME: 333.333 MHz
L1: D-cache 32 KiB enabled
I-cache 32 KiB enabled
Reset Configuration Word (RCW):
00000000: 0e08000a 0a0a0a0a 00000000 00000000
00000010: 04362828 3f55bc00 ec027000 f5000000
00000020: 00100000 00000000 00000000 0001fffc
00000030: 00000100 52800009 00000000 00000028


Thanks you very much !


Ye

Labels (2)
0 Kudos
Reply
3 Replies

35,812 Views
qinglinye
Contributor I

Hi,

Our bandwidth tests include 8 bit / 16 bit / 32 bit / 64 bit read bandwidth test8 bit / 16 bit / 32 bit / 64 bit memory write bandwidth test,  8 bit / 16 bit / 32 bit / 64 bit memory copy bandwidth test and memcpy(C function) bandwidth test. We have tested these under VxWorks6.9 and 7.0 OS, and u-boot firmware. The hardware configuration informations are as follows:

U-Boot 2015.01+SDKv1.9+geb3d4fc (Oct 14 2017 - 15:31:01)

CPU0:  T4240, Version: 2.0, (0x82400020)

 Core:  e6500, Version: 2.0, (0x80400120)

Clock Configuration:

       CPU0:1000 MHz, CPU1:1000 MHz, CPU2:1000 MHz, CPU3:1000 MHz,

       CPU4:1000 MHz, CPU5:1000 MHz, CPU6:1000 MHz, CPU7:1000 MHz,

       CPU8:1000 MHz, CPU9:1000 MHz, CPU10:1000 MHz, CPU11:1000 MHz,

       CCB:700  MHz,

       DDR:800  MHz (1600 MT/s data rate) (Asynchronous), IFC:43.750 MHz

       FMAN1: 700 MHz

       FMAN2: 700 MHz

       QMAN:  350 MHz

 

 

 

 

We use the C language pointer to read,write and copy oprations. The read and write bandwidth test use the malloc C function to allocat a 128MB memory, and the memory copy test use the malloc C function to allocat two 128MB memory space. We calculate the time that it takes for reading, writing, or copying opration of the 128MB memory,  and then convert it into bandwidth. The following is one of our test funtions:

UINT MemcpyBandwidthSt()

{

int i;

unsigned int  gvTimeBegin,gvTimeEnd;

double  spTime;

UINT64 *testMemAddr;

UINT64 *testMemAddr2;

unsigned int dataSzie = 0x8000000;

 

testMemAddr = malloc(dataSzie);

testMemAddr2 = malloc(dataSzie);

 

gvTimeBegin = sysTimestamp();

memcpy(testMemAddr,testMemAddr2,dataSzie);

gvTimeEnd = sysTimestamp();

 

spTime = (double)(gvTimeEnd-gvTimeBegin)/ (vxbTimestampFreq ()); /* 换算成时间: sec */

taskDelay((gvTimeBegin&0xf)*2+vxCpuIndexGet());

 

printf("%s: Memory Test bandwidth is %d MB/sec, run at cpu[%d]  \n", __FUNCTION__,(int)( (double)dataSzie/1024.0/1024.0/spTime),vxCpuIndexGet() );

 

for(i=0;i<dataSzie/8;i++)

{

if( testMemAddr[i] != testMemAddr2[i])

{

printf("%s: data Compare error at %p and %p,  data: 0x%lx != 0x%lx !!!!!!!!!!\n", __FUNCTION__, &testMemAddr[i],&testMemAddr2[i],testMemAddr[i],testMemAddr2[i]);

break;

}

}

 

free(testMemAddr);

free(testMemAddr2);

return 1;

}

 

The resutl is as the following:

    MemcpyBandwidthSt: Memory Test bandwidth is 364 MB/sec, run at cpu[0]   

 

 

 

The memory reading opration bandwidth and memory copying opration bandwidth are much lower than the theoretical value. The actual test results show that the 64bit memory reading bandwidth of 510~ 550MB / s,  and memory copy bandwidth is about at 350~410MB/s .  But the theoretical bandwidth: 800MHz*2* 64bit / 8bit = 12.8 GB / s, even at 1/4, there should be a 3.2GB /s bandwidth . The read operation bandwidth of the memory is very low , which also results in a low copy bandwidth of the memory , which affects the overall performance of the system.

 

As a result, we looked for an other manufacturers’s board, such as a T4240 VPX card,  also found that the same situation, memory read bandwidth and copy bandwidth is low. We also looked for a NXP's T2080QDS card to test, memory read bandwidth and copy bandwidth is low too.  These results are in the previous attachment.

 

 

T4240 card 1

T4240

T2080QDS

1) CPU:1.6 GHz

2) CCB:700MHz

3) DDR:800MHz

CPU1.5GHz

CCB700MHz

DDR:800MHz

 

MemWriteBandwidth8bitSt

371

348

419

MemWriteBandwidth16bitSt

723

667

821

MemWriteBandwidth32bitSt

1375

1291

1572

MemWriteBandwidth64bitSt

2499

2343

2001

MemReadBandwidth8bitSt

233

230

368

MemReadBandwidth16bitSt

338

277

383

MemReadBandwidth32bitSt

438

455

491

MemReadBandwidth64bitSt

513

542

559

MemCopyBandwidth8bitSt

225

208

242

MemCopyBandwidth16bitSt

316

295

321

MemCopyBandwidth32bitSt

344

347

410

MemCopyBandwidth64bitSt

429

384

418

MemcpyBandwidthStmemcpy Bandwidth

410

358

406

 

 

 

We also find that memcpy or bcopy functions are used extensively in the kernel of the operating system (as the TCP/IP protocol stack). If the memory copy performance is low the performance of the whole system will be affected.

 

 

T4240 is one of NXP's highest performance QorIQ CPUs, for such a memory read and memory copy bandwidth, we have a lot of confusion about this. How can I solve it?

0 Kudos
Reply

35,812 Views
shuangjunzhu
NXP Employee
NXP Employee

Actually, the most cycles are consumed due to cache-miss. So the 8-bit/16-bit performance are similar. If you could add read prefetch and write before cache zero, it will improve the performance much more.

0 Kudos
Reply

35,812 Views
qinglinye
Contributor I

Hello,

Thanks for your reply.

2019-08-02

tsinglin.ye

发件人: shuangjunzhu-r65879

发送时间: 2019-08-01 15:31:06

收件人: Qinglin Ye

抄送:

主题: Re: - Re: T4240 memory reading bandwidth and memory copybandwidth are very low, Why?

NXP Community

Re: T4240 memory reading bandwidth and memory copy bandwidth are very low, Why?

reply from Jefferry Zhu in T-Series - View the full discussion

Actually, the most cycles are consumed due to cache-miss. So the 8-bit/16-bit performance are similar. If you could add read prefetch and write before cache zero, it will improve the performance much more.

Reply to this message by replying to this email, or go to the message on NXP Community

Start a new discussion in T-Series by email or at NXP Community

Following Re: T4240 memory reading bandwidth and memory copy bandwidth are very low, Why? in these streams: Inbox

This email was sent by NXP Community because you are a registered user.

You may unsubscribe instantly from NXP Community, or adjust email frequency in your email preferences

0 Kudos
Reply