QSPI, Slower Than Expected

Discussion created by TomE on Jun 2, 2013

I've just been investigating the QSPI transfer speed, and it falls short of a naive reading of the Reference manuals and Data Sheets.


All these measurements were taken on an MCF5329 running at 240MHz.


There's a bunch of QSPI delay parameters documented, but most seem to control the start and end of a transfer (time between transfers and delay after chip-select). It is ambiguous as to whether "a transfer" means one byte/12/16-bits or means up to 16 of them back-to-back under one chip-select assertion.


Once a multi-byte transfer is started I'd expect the bytes to be sent back-to-back.


At 10MHz a byte should take 800ns. I'm measuring 1100ns. That's an extra 3 clocks or 37.5%.


At 20MHz a byte should take 400ns. I'm measuring 640ns. That's an extra 4.8 clocks or 60%


At the other end of the scale, at (10/16)MHz 8 bits are 12.8us, but a byte takes 13.8us to transfer. That's 7.8%.


That looks to be overhead in the QSPI hardware loading and storing the bytes internally.

This is actually mentioned in the Reference Manual as:


Standard delay after transfer = 17/Fsys3

Adequate delay between transfers must be specified for long data streams because the

QSPI module requires time to load a transmit RAM entry for transfer. Receiving

devices need at least the standard delay between successive transfers.


17/Fsys3 is 80MHz or 12.5Ns, so 17 of them are 212.5ns. That's close to the delays I'm measuring.


That's also a lot at 20MHz. The effective throughput at 20MHz (the maximum allowed) is only 12.5MHz.


There's another bottleneck in the QSPI. In order to start a transfer, the QAR has to be written twice and the QDR has to be written with the commands and data for the transfer. Then after a transfer, QAR has to be written again and the received data read back from QDR. So that's from 6 to 51 reads and writes of the QSPI registers, ignoring any writes to QMR and QDLYR. Since the QSPI isn't running at the CPU clock rate it takes a while to read and write its registers.


Which you'd expect to be pretty fast unless told otherwise. As there's nothing in the manuals on this subject, read these:


Re: MCF5307, execution speed question


Discussing Motorola 68k/ColdFire microprocessors


That concerns writing to the GPIO registers, which take 18 CPU Clocks on the MCF5329, possibly 12 on the MCF52xx and maybe more on the MCF54xx, It depends on which CPU you're using. On the MCF5329 this is 18 clocks, and 33 clocks for a "port |= bit" instruction.


I'm measuring 15 CPU clocks for QSPI register reads and writes. That adds up to over 750 CPU Clocks for a 16 byte transfer, or over 3us.