Hello,
this code was written in a generic way to handle any number of bytes transferred. And the data need to be read sequentially from HW buffer after status bit signals that it's not empty. And there were some issues with the ESDHC status bits themselves...
So these are the reasons. If you have some specific scenario, you can write optimized read function of your own.
Anyway, changing baudrate doesn't help much because of the delays on all stack levels.
The only thing I see to improve the throughput is to use CMD18 (read multiple blocks with one command) instead of CMD17.
Regards,
PetrM