Devices which support DMA have DMA request bits. These are defined in "Table 14-2. DMAREQC Field Description (Continued)". The only devices that can request DMA are the four DMA Timers and the three UARTs. Nothing else can.
However, there's nothing from stopping you starting a "Dual Address DMA Transfer" to copy a block of data (the QSPI registers) to and from memory, triggered by software. But be aware that anything the DMA Controller can do, the CPU can do a lot faster. It takes longer to set up a DMA transfer (of say 16 words in the QSPI) that to have the CPU do the transfer itself, so it isn't worthwhile.
The QSPI supports a maximum of 16 bit transfers. That's limited by the "Receive RAM" being 16 bits. You can have longer "external" transfers like 32-bits by having the external chip-select remain asserted during two 16-bit transfers.
Does your "external 128 byte RX and TX transfer" have to be 8 bit transfers? Does the external device need chip-select deasserted every 8 bits? If it can handle longer transfers, you can do better, but assuming it can't...
The best you can do is to load up the Command and Transmit RAM for your 6-word transfer, and then do that. After that has finished, load up all 16 with 16 transfers (command and transmit data) to your block device and have it run once. Then interrupt, read the 16 bytes of data, write a new 16 bytes and kick it off again. Repeat 8 times. Or 4 times if you can send 16-bit SPI sequences. Then load up for the 6-word transfer devices and run that.
Before you say "too slow", this is a 150MHz CPU. It can perform those interrupts and reloads pretty quickly (as long as you've got the cache enabled). The slowest part of the whole thing is the unexpectedly long time it takes to read and write the individual RAM locations in the QSPI. They probably take 20 clocks each or so.
By "too slow" do you mean your "six devices" need servicing at something like 100kHz?
I've got code here that is using the QSPI to control three 8-channel ADCs, performing 96 conversions (interleaved data and a zero channel) at 4kHz. It has to reload the QSPI six times for each conversion. So that's "load/burst/load/burst..." and so on. Here's what that looks like:

You can see from the cursors that the 16-word burst took 23us. All six of them together (from another capture I have) took 153us. In the small gap between each of the six bursts, the CPU was interrupted, read 16 results, loaded 16 new transmit commands and started the next one. In about 3us. Meaning a 13% overhead over what a "perfect DMA transfer" could do. The SPI is running at 12.5MHz.
On another product I had the QSPI handling one ADC device plus three MCP2515 CAN controllers. Those things need a huge amount of SPI transfers to handle the CAN protocol. It all worked fine. This CPU is really fast if you're programming it properly.
Tom