The obvious answer is that your data-load code-sequence is unable to 'keep up' with the TX byte-rate. Slow your SPI clock, and/or clean up the code sequence. At BR=30 you've got 240 bus clocks per byte, or (presumably) 480 CPU clocks. Seems like a lot, but clocks are also easy to 'waste' ---- looks like you might be taking over twice that time?
You might also consider loading 16-bits per SPI transaction, rather than 8.
You will probably also have to make sure you are fully utilizing the SPI FIFOs so that the TX hardware always has a byte 'ready to go' on the very next SPI clock. If you wait for each 'TX complete' before even loading the 'next' you will surely skip a clock.
Can you capture your full sequence of 'PUSHR' values in a little debug-array, and show us that after a full message?