1 -- Bandwidth control to 'slowest' just on general principles. Even with my six channels each way, I have no need to allow DMA to 'hog' memory access to fill or empty a FIFO.
2 -- I don't know what your overall sample rate is, but I CERTAINLY wouldn't want to count on interrupt response time throughout a whole complex system to insure access to DMA address controls between the end of one sample set and the start of the next. That's why I set my major cycle to 'rewind' itself, AND to interrupt at both 'half' and 'full' points, so that my memory arrays are complete double-buffer (ping-pong) sets I can be absolutely assured are aligned, contiguous data points as long as my overall interrupt/processing-overhead completes in less than one sample-set time (one half of overall buffer size in sample counts). I don't think we can worry about 'DMA priority' especially -- you say 'running continuously', but even if you run some exotic 128Ksamples/s the DMA is still 'idle' in each direction 99% of the time. Once the RX FIFO is empty, that DMA channel will stall until more show up.
3 -- My setup is 'network' mode to allow many channels, you want only two. I think we can assume I2S (normal stereo mode) is a configurable subset in terms of data controls thru DMA. You mention TX0 and TX1 (and I assume RX0 and RX1) FIFO usage, as in 'two channel operation' TCHEN. I don't think you can, or want to, DMA that way -- you've only got one-each TX and RX DMA requests, and this would force the DMA to shuttle back and forth between the two FIFO registers, for no particular advantage. With DMA you probably don't need FIFOs at all.