Hello,
You can't just send 20,000 transactions with no gaps in the data. First of all, none of the masters would send a transaction that large. The DMA is probably the most efficient way to move the data, but the largest DSIZE option in the DMA is 32-bytes. So the DMA maxes out at 16,16-bit accesses.
Then on the SEMC side, you also have to look at the burst size. For the SDRAMC the max burst length is 8 (SDRAMCR0[BL]). If you use the SRAM interface instead you can do up to a 64 beat burst (SRAMCR[BL]). But the actual burst size will also depend on the master too.
If their FPGA can support the synchronous SRAM interface, then that is probably the best option. The problem is that after every 16, 16-bit transactions there will be a gap for the DMA to complete its write cycle (it has to put the data it read somewhere), then the SEMC will resend the address for the SRAM and include any latency count clocks before the next 16, 16-bit data transactions on every single clock.
If they go to RT1170, then the big advantage is that the SDRAM interface can be 32-bits instead of 16, so that doubles the throughput. The max clock also goes to 200MHz. The SRAM interface isn't wider on RT1170, so while you can run it faster than on RT10xx, on RT1170 the SDRAM interface is probably the best option, but only if the FPGA can work with the 32-bit bus.
So technically the peak throughput can be what you want, the problem is that we can sustain that for anywhere near the 20k (or 10k with a 32-bit bus) that they want. We are going to have lots of gaps in between the bus cycles that will bring down the overall throughput to the point where it is nowhere near what you want. Pretty much any bus is going to have gaps between cycles and overhead that drag down the performance.
Regards,
Victor