More optimisation led to just over 300k/s which is still far away from my target.
I have reviewed the application notes, but they all show examples where prepared data appears to be copied to the USB IN (TX) buffer in the interrupt once the previous packet has been sent. This leads to slow performance. The S08 CPU doesn't have enough MIPS to use memcpy.
e.g. original SDcard reader app is similar to:
1. receive SCSI28 command
2. read 512 byte sector from SDcard into RAM array
3. interrupt driven copy 64byte blocks to USB RAM and send
4. repeat 3 until all 512 bytes sent.
Steps 2 and 3 cripple the throughput. My performance boost was by combining these so that there is no second copy. I read directly from the SDcard with SPI into the USB IN buffer. The SPI-read, USB-store loop is highly time critical and saving a few instruction cycles made a noticeable real-world speed change.
However, the MCU is still spending a fair proportion of its time waiting for the next USB data request.
My next idea was to read-ahead to a different USB RAM memory buffer and then flip the BDT address when the interrupt said it was ready to send. My code presently doesn't quite seem to work, using wireshark to monitor USB transactions to my PC it gets part way through the card initialisation process and then times out.
i.e. I have two buffers that I use. 64 bytes from 0x18e0 and 64 bytes from 0x1920
While the USB module is sending buffer 1, I write to buffer 2. When the complete interrupt arrives I set a flag. Once my mainloop code has copied the next buffer I set the appropriate address in the BDT to point to the latest buffer and initiate the transmission by setting the OWN flag to 1.
Is there anything conceptually wrong with this? If this works, what's the point of the hardware double-buffers?
I had a look at the datalogger application note (AN3582) that uses the EP5 ping/pong buffer. Unfortunately the code is using memcpy which will kill performance.
I've tried using the ping/pong in my code, but no success yet. Only one buffer is sent before a timeout.
Any ideas before I move to a different MCU with DMA ?
James