By your way with wraparound, you cal poll QWR[CPTQP]:
  - Wait until the QWR[CPTQP] field is changed;
  - Set QAR to address of above completed transmit word;
  - Write QDR with the next 16-bit transmit word;
  - Repeat  above operations until end of file.
 
For my oppinion, your algorithm is problematic - without notable communication speed improvement.
Let use standard way without wraparound (derived from NetBurner code and debugged with mcf5270):
  - Fill next 16 transmit words;
  - Enable the QSPI communication;
  - Wait for finish: either poll QIR[SPIF] or concerned interrupt;
  - Repeat above operations until end of file.
 
Tune the baudrate, 'Delay after transfer', 'CS to SCLK', and other parameters according to your FPGA chip datasheet.