Hello Satishkumar,
The reason you can't get theoretical performance is because the system has some latency on each access made to the memories. So 533MB/s 64bit@133MHz would only be possible on a system with no latency. If the DMA would use AXI would had a better pefromance due the ability to post outstanding transactions but since the DMA uses AHB to match realtime/predictability requirements, it will wait until the transaction is done before posting any other r/w.
There are two transactions on a DMA transfer, read and write and each will be subject to some latency. There is latency due passing through the NIC and latency of the slave peripheral itself (in this case, internal RAM) generally speaking it would be:
Latency: Master NIC lat + Slave NIC lat + Buffering NIC lat + Slave lat
NIC eDMA max latencies are: read = 2, write = 4
NIC SRAM max Latencies are: read = 4, write = 4
Some additional latency due buffering since AHB to AXI conversion.
Slave latency for SRAM is zero.
And this are the measurements:
MajorLoop | MinorLoop | SOFF/DOFF | SSIZE/DSIZE | BWC | Time in us | MB/Sec | rw # transactions | Clk ticks @ 133MHz | Clks/trans |
1 | 16384 (16KB) | 32 | 5 (32bytes) | 0 | 101 | 162 | 512 | 13436 | 26.2 |
1 | 16384 (16KB) | 8 | 3 (64bit) | 0 | 218 | 75 | 2048 | 28784 | 14 |
1 | 16384 (16KB) | 4 | 2 (32bit) | 0 | 435 | 37 | 4096 | 57448 | 14 |
1 | 16384 (16KB) | 2 | 1 (16bit) | 0 | 869 | 19 | 8192 | 114800 | 14 |
1 | 16384 (16KB) | 1 | 0 (8bit) | 0 | 1738 | - 9.4
| 16384 | 229480 | 14 |
If you look at the measurements they are in line with the latencies, each transaction read write is 14 cycles. Which is somewhat aligned to the read, write (2+4) + (4 +4) latencies from the NIC.
All transactions required the same amount of cycles regardless the SSIZE/DSIZE value (except for 32bytes)
I believe in the case of 32bytes we have a penalty on the buffering and that is why we get additional cycles per r/w.
The other possibility is the NIC latency is a bit lower but we are having anyway buffering latency added to the transaction due the AHB to AXI translation.
The conclusion is not possible to achieve a 533MB/s with the eDMA. And the results make sense with the expected latency of such system. If you need to increment performance, you can use another DMA (eDMA1 for example) in parallel or you can use the CPU or GPU which would have way more efficient accesses due AXI protocol.