Hello and welcome to the fora.
The 4 cycles referred to here is CPU clock cycles.
The access details for an Extended mode STA is pwpp
and for an extended mode LDA is prpp.
Thus if you do a LDA then STA there will be 3 cycles between the w (write) and the r (read).
The NOP added between extends this to 4 cycles as this is the time required "so the internal FLASH command sequencer can properly update the FCBEF and FCCF flags in FSTAT".
This timing is independent of the flash clock.
The details of the FLASH operation are all a bit of a secret, possibly due to the fact that the technology is only licensed by freescale.