SPI by nature has a loopback. The data shift register is like a standard 7400 shift register, which has a bit in, a bit out, and 8 bits parallel out, in, or both. MOSI is the serial data output pin, and MISO is the serial data input pin, of that shift register. The shift registers of the MCU and the peripheral part are wired as a loop. Theoretically you shouldn't have to put data into the shift register to get data back, you just need to generate the 8 clock pulses. But the MCUs apparently are design such that when you write a byte, it generates the 8 clock pulses. There is no other way to generate the 8 clock pulses, at least with the DSPI.
Now if what I just wrote is complete nonsense, then please correct me. That is my understanding of it.
It appears the answer to my original question is that it takes 2 DMA channels in order to receive data with DMA. One is constantly transmitting bytes, the other receiving a byte for every one sent. This then begs the question, can I block transfer from a single memory location, ie the increment is 0. Because I would sooner send a load of FF's than some random data out of memory.
Also there is the question of this 32 bit PUSH register, which has control bits, and those control bits presumably change depending on whether you are starting, continuing, or ending a block transfer. That would make sense, though I haven't looked at the docs for months and don't remember the particulars. I frankly did not understand it in the first place. Now I can gripe about imprecise documentation that only makes sense once you know the answers. But no point in that. Bottom line is, if the bytes stuffed into the PUSH register will change in any way through the transmission, that creates a new problem. It seems like the logical answer is, the start and stop are done "manually" (by the MCU code), while the middle part of the transfer is done by the DMA.
I find it ridiculous that a question like this could be left hanging for so long, without input from Freescale. Once you get to know the parts, and have your MCU-dependent code written, its all fine, the parts are great. But getting there? Unbelievable. In the old days, you could phone an FAE in Austin TX, and that guy would know EVERYTHING. What is happening today is ...well...I can't say that word. You get the picture.