I see that you want to send 512kB / sec = 4Mbit / sec = cca 4 Msamples/ sec.
This is at the edge of ADC transfer and such ADC sample rate could be achieved only without external HW triggering, but in a conversions that are solely requested in the polled mode.
In such case I suppose that the PDB period is very short and huge amount of interrupts are generated, that downgrade the performance of the other normal-priority code, for example USB stack for KHCI.
Can you give us some details what was tested, some pieces of code etc?