The driver we are currently using seems to be playing very safe by restricting the size of a FLASH command to the size of a FIFO. Furthermore, the size of these FIFOs may have been inherited from an early Vybrid implementation: TX is configured to 64 bytes, which RX is configured to 128 bytes. If you consider the case of a Page Program command, on a large Flash requiring 4-bytes address, the largest possible PP command you could send at a time is:
[a3][a2][a1][a0][59 bytes of data]
where a3-a0 represents the address in big Endian. Considering that we are using a Spansion S25FL512S, we could technically program a full 512-bytes page at a time, but we are restricted to 59 bytes due to the driver implementation. However, we still need to stay within a page boundary in a PP command, so programming a full 512-bytes page would be cut into the following sequence: 59, 59, 59, 59, 59. 59, 59, 59, 59, 40. That does not seem very efficient.
I read chapter 28 (QuadSPI) of the LS1021A reference manual multiple times, especially section 126.96.36.199 about Flash Programming, and it would seem that we could indeed do better. The TX and RX buffers are not only FIFOs, but they are circular FIFOs. If I understand correctly the Flash Programming sequence in section 188.8.131.52, we could improve our driver by doing the following:
- Ensure the TX buffer is empty, and clear it if necessary.
- Program the address related to the command.
- If the command fits in the FIFO, write it word-by-word into the TX Buffer Data Register (TBDR).
- Otherwise, fill the FIFO with the first 32-bytes to be sent and write the full size of the command in IDATSZ.
- Trigger the command by setting the appropriate LUT Sequence ID.
- Busy loop to monitor the "TX full" bit: if we have remaining data to be sent, put the next one.
- Perform a sanity check on the number of words actually written in the FIFO.
- Wait for programming to complete.
Before jumping to implement such a thing, I'd like to verify 2 things:
- Is my understanding correct?
- Does anyone have a reference implementation that uses the "full power" of the QuadSPI FIFOs? We are using vxWorks, but I guess we could adapt a Linux driver without too much problem.