Dave, as you said, USB Audio requires a way to keep the audio clocks synched.
When the MCU is the sink (receiving audio data):
Adaptive: you can recover the clock from the data rate. Meaning that you would need an external PLL or clock synthesizer to adjust the audio clocks (go faster or slower) to avoid buffer underrun or overrun.
This is a good method, however, you're adjusting clocks, adding some jitter and impacting audio quality.
Asynchronous (with feedback): On this method, the audio clocks are fixed, no need to change clocks (a plus against adaptive).
On this synch method, the MCU keeps tracks of the audio buffers and rate. For instance, you have a circular buffer, USB audio is the write pointer and the I2S is the read pointer. You can measure the distance between both pointers and evaluate if there are too close or too far.
If there are too close, you request less samples to the USB host thru the feedback endpoint, if there are too far, you request more samples to the USB host.
Example can be, if you have 48Khz audio and detect that the pointers are too close, you require less samples, send a 47.9 samples to the host or if you require more samples, then send a 48.1.
This will translates into one more or one less sample per channel on a USB audio packet.
In the case the MCU is the audio source (microphone), the adaptive implements the feedback endpoint. And you can use the same method as the asynchronous to ask the host to request less or more data.