By your way with wraparound, you cal poll QWR[CPTQP]:
- Wait until the QWR[CPTQP] field is changed;
- Set QAR to address of above completed transmit word;
- Write QDR with the next 16-bit transmit word;
- Repeat above operations until end of file.
For my oppinion, your algorithm is problematic - without notable communication speed improvement.
Let use standard way without wraparound (derived from NetBurner code and debugged with mcf5270):
- Fill next 16 transmit words;
- Enable the QSPI communication;
- Wait for finish: either poll QIR[SPIF] or concerned interrupt;
- Repeat above operations until end of file.
Tune the baudrate, 'Delay after transfer', 'CS to SCLK', and other parameters according to your FPGA chip datasheet.