Not getting the Max DMA Transfer Rate in Vybrid processor

sathishkumarsig · ‎04-08-2014

I have Faraday EVB. In which, I am trying to find out the eDMA peak transfer rates (Mbytes/sec).

I have conducted the test by transferring 256KB (i.e Minor Loop Count= 262144, Major Loop Count=1)of data from one location to another.

DMA Channel used = 0th Channel

SADDR(source address) = 0x3f000000(IRAM)

DADDR(destination address) = 0x3f040000(IRAM)

SOFF = 32

DOFF =32

MinorLoopCount = 262144

MajorLoopCount =1

LastSourceAdjustment =0;

LastDestinationAdjustment=0;

MinorLoopOffsetDest =0;

MinorLoopOffsetSrc=0;

MinorLoopOffset=0;

SSIZE =5

DSIZE =5

BWC = 3

The SYSCLK is selected as PLL1 PFD3 for 396MHz and ARM_DIV = 1,BUS_DIV = 3.

As per the datasheet "Vybrid_Reference_Manual_F_Series_-_Rev_3.pdf" in Table 22-1636. for 133.3 MHz frequency transferring data from SRAM to SRAM, I supposed to get 533.3MBytes/sec. But I am getting around 150MB/s

Can anyone support to identify what went wrong? Thanks in advance.

ioseph_martinez · ‎05-07-2014

Hello Satishkumar,

The reason you can't get theoretical performance is because the system has some latency on each access made to the memories. So 533MB/s 64bit@133MHz would only be possible on a system with no latency. If the DMA would use AXI would had a better pefromance due the ability to post outstanding transactions but since the DMA uses AHB to match realtime/predictability requirements, it will wait until the transaction is done before posting any other r/w.

There are two transactions on a DMA transfer, read and write and each will be subject to some latency. There is latency due passing through the NIC and latency of the slave peripheral itself (in this case, internal RAM) generally speaking it would be:

Latency: Master NIC lat + Slave NIC lat + Buffering NIC lat + Slave lat

NIC eDMA max latencies are: read = 2, write = 4

NIC SRAM max Latencies are: read = 4, write = 4

Some additional latency due buffering since AHB to AXI conversion.

Slave latency for SRAM is zero.

And this are the measurements:

MajorLoop	MinorLoop	SOFF/DOFF	SSIZE/DSIZE	BWC	Time in us	MB/Sec	rw # transactions	Clk ticks @ 133MHz	Clks/trans
1	16384 (16KB)	32	5 (32bytes)	0	101	162	512	13436	26.2
1	16384 (16KB)	8	3 (64bit)	0	218	75	2048	28784	14
1	16384 (16KB)	4	2 (32bit)	0	435	37	4096	57448	14
1	16384 (16KB)	2	1 (16bit)	0	869	19	8192	114800	14
1	16384 (16KB)	1	0 (8bit)	0	1738	9.4	16384	229480	14

If you look at the measurements they are in line with the latencies, each transaction read write is 14 cycles. Which is somewhat aligned to the read, write (2+4) + (4 +4) latencies from the NIC.

All transactions required the same amount of cycles regardless the SSIZE/DSIZE value (except for 32bytes)

I believe in the case of 32bytes we have a penalty on the buffering and that is why we get additional cycles per r/w.

The other possibility is the NIC latency is a bit lower but we are having anyway buffering latency added to the transaction due the AHB to AXI translation.

The conclusion is not possible to achieve a 533MB/s with the eDMA. And the results make sense with the expected latency of such system. If you need to increment performance, you can use another DMA (eDMA1 for example) in parallel or you can use the CPU or GPU which would have way more efficient accesses due AXI protocol.

alejandrolozan1 · ‎04-21-2014

Hi,

Would you be nice enough to share the entire project?

I will be glad to check it out and see what I can do.

Best Regards,

Alejandro

karina_valencia · ‎04-28-2014

Hi Sigamani,

can you provide the information requested previously?

sathishkumarsig · ‎05-04-2014

Hi, I am communicating with Ioseph regarding this. Thanks for your replies.

ioseph_martinez · ‎05-05-2014

Hi, just trying to confirm the calculated latency numbers with design. They were out for a couple of weeks. Hopefully will get some answer this week.

Not getting the Max DMA Transfer Rate in Vybrid processor

Not getting the Max DMA Transfer Rate in Vybrid processor

VF5xxR