I now see the entry in the datasheet that indicates that SRAM 4 (0x20018000-0x2001c000) is for PowerQuad use. That brings up three issues:
1) That fact is virtually hidden among hundreds of pages of documents, and is completely unmentioned in the App Notes. Not good.
2) When you enable the PQ peripheral in an MCUXpresso project, like we did, absolutely nothing happens to prevent the linker from putting code and/or data in that location. In fact, for the longest time on our project, the stack was put at the end (by default) of SRAM, which is directly in SRAM 4. Also, not good.
3) FInally, I don't think that it is actually true that the PQ uses SRAM 4 at all. I reduced some data structures in my code and moved the stack to the "end of data" such that SRAM 4 is completely unused. I filled all of SRAM 4 with 0xA5. Then I ran our code, while pointing the tmpbase register to a pre-allocated buffer like this:
static q31_t fftInputBuf[512]; // FFT input (real)
static q31_t fftTmpBuf[512]; // point tmpbase at this!
static q31_t fftOutputBuf[257][2]; // FFT output (cmplx), need an extra?
static pq_config_t pq_fft_cfg = {kPQ_32Bit, 0, kPQ_32Bit, 0, kPQ_32Bit, 0, kPQ_32Bit, 0, kPQ_32Bit, (uint32_t *)fftTmpBuf};
PQ_Init(POWERQUAD);
PQ_SetConfig(POWERQUAD, &pq_fft_cfg);
PQ_TransformRFFT((POWERQUAD_Type *)POWERQUAD_BASE, 512, fftInputBuf, fftOutputBuf);
This code executes flawlessly, the PQ uses the fftTmpBuf as is scratchpad, and SRAM 4 is *undisturbed* over dozens and dozens of PQ calls (including PQ_MatrixScale, PQ_MatrixAddition, PQ_MatrixProduct, PQ_MatrixMultiplication as well as PQ_TransformRFFT)
Note that this is a good thing, because the datasheet implies that SRAM 4 (all 16 kB of it!) can't be used if the PQ is used. And SRAM is a critical resource -- why should an entire 16 kB not be useable. Fortunately it appears that the hardware itself behaves like a programmer would like it to. Simply point tmpbase to a pre-allocated scratch buffer before calling the RFFT and everything is great.
Which brings me to reiterate, that the PowerQuad documentation from NXP is severely lacking. The PQ unit is a massive competitive advantage over all other ARM-based processors and NXP should really take advantage of that be fully, and correctly, documenting it.