I have Faraday EVB. In which, I am trying to find out the eDMA peak transfer rates (Mbytes/sec).
I have conducted the test by transferring 256KB (i.e Minor Loop Count= 262144, Major Loop Count=1)of data from one location to another.
DMA Channel used = 0th Channel
SADDR(source address) = 0x3f000000(IRAM)
DADDR(destination address) = 0x3f040000(IRAM)
SOFF = 32
DOFF =32
MinorLoopCount = 262144
MajorLoopCount =1
LastSourceAdjustment =0;
LastDestinationAdjustment=0;
MinorLoopOffsetDest =0;
MinorLoopOffsetSrc=0;
MinorLoopOffset=0;
SSIZE =5
DSIZE =5
BWC = 3
The SYSCLK is selected as PLL1 PFD3 for 396MHz and ARM_DIV = 1,BUS_DIV = 3.
As per the datasheet "Vybrid_Reference_Manual_F_Series_-_Rev_3.pdf" in Table 22-1636. for 133.3 MHz frequency transferring data from SRAM to SRAM, I supposed to get 533.3MBytes/sec. But I am getting around 150MB/s
Can anyone support to identify what went wrong? Thanks in advance.
Hello Satishkumar,
The reason you can't get theoretical performance is because the system has some latency on each access made to the memories. So 533MB/s 64bit@133MHz would only be possible on a system with no latency. If the DMA would use AXI would had a better pefromance due the ability to post outstanding transactions but since the DMA uses AHB to match realtime/predictability requirements, it will wait until the transaction is done before posting any other r/w.
There are two transactions on a DMA transfer, read and write and each will be subject to some latency. There is latency due passing through the NIC and latency of the slave peripheral itself (in this case, internal RAM) generally speaking it would be:
Latency: Master NIC lat + Slave NIC lat + Buffering NIC lat + Slave lat
NIC eDMA max latencies are: read = 2, write = 4
NIC SRAM max Latencies are: read = 4, write = 4
Some additional latency due buffering since AHB to AXI conversion.
Slave latency for SRAM is zero.
And this are the measurements:
MajorLoop | MinorLoop | SOFF/DOFF | SSIZE/DSIZE | BWC | Time in us | MB/Sec | rw # transactions | Clk ticks @ 133MHz | Clks/trans |
1 | 16384 (16KB) | 32 | 5 (32bytes) | 0 | 101 | 162 | 512 | 13436 | 26.2 |
1 | 16384 (16KB) | 8 | 3 (64bit) | 0 | 218 | 75 | 2048 | 28784 | 14 |
1 | 16384 (16KB) | 4 | 2 (32bit) | 0 | 435 | 37 | 4096 | 57448 | 14 |
1 | 16384 (16KB) | 2 | 1 (16bit) | 0 | 869 | 19 | 8192 | 114800 | 14 |
1 | 16384 (16KB) | 1 | 0 (8bit) | 0 | 1738 |
| 16384 | 229480 | 14 |
If you look at the measurements they are in line with the latencies, each transaction read write is 14 cycles. Which is somewhat aligned to the read, write (2+4) + (4 +4) latencies from the NIC.
All transactions required the same amount of cycles regardless the SSIZE/DSIZE value (except for 32bytes)
I believe in the case of 32bytes we have a penalty on the buffering and that is why we get additional cycles per r/w.
The other possibility is the NIC latency is a bit lower but we are having anyway buffering latency added to the transaction due the AHB to AXI translation.
The conclusion is not possible to achieve a 533MB/s with the eDMA. And the results make sense with the expected latency of such system. If you need to increment performance, you can use another DMA (eDMA1 for example) in parallel or you can use the CPU or GPU which would have way more efficient accesses due AXI protocol.
Hi,
Would you be nice enough to share the entire project?
I will be glad to check it out and see what I can do.
Best Regards,
Alejandro
Hi Sigamani,
can you provide the information requested previously?
Hi, I am communicating with Ioseph regarding this. Thanks for your replies.
Hi, just trying to confirm the calculated latency numbers with design. They were out for a couple of weeks. Hopefully will get some answer this week.