> I guess here is where I should be polling SCI data register, right?
Yes. It is a pain that code is in assembly, unless all of your other code is as well.
That isn't the worst assembly code I've ever seen, but it comes close. It directly writes to (FSTAT), and then it loads that register's address into a register INSIDE THE LOOP. It should do one or the other, and not reload inside the loop, or clear "d1" inside the loop. This is a "polling loop" so it doesn't have to be fast, but is bad practice and ugly-ugly-ugly. Why is it loading a byte, copying to a different register, clearing the upper 24 bits when the "AND" is only testing one bit anyway? Half of the instructions in that loop are redundant.
The function doesn't clear the error bits and it isn't testing the error bits either. If those checks aren't done anywhere else in the code then it would be very easy to have it lock up if called with a bad address, and not tell you why, or lock up and silently fail until the CPU has been reset. Not a good programming example.
Another problem with trying to add your code (to read and stash SCI bytes) is that you don't know what registers the callers to that function are using and what registers that function is allowed to use. i\It is essential programming practice to document the register calling and preservation conventions in use. This code doesn't do that anywhere I can see (and I've downloaded the entire ZIP archive and looked)l, so you're going to have to reverse-engineer the lot to make sure you're not clobbering anything, or you'll have to save all registers somewhere to make sure you don't accidentally corrupt something a caller is using.
You need to poll and read bytes into a buffer. You have to keep track of buffer pointers, or indexes and counts. This is fairly simple in C but tricky in assembly. Then you have to share the common buffer and index between this code and your mainline (easy if your code is assembly, a bit trickier if you're using C).
If you haven't written ColdFire assembly before, watch out, there are some traps. A byte read to a register doesn't zero or sign extend to the upper bits. That's why that code above is clearing d1 (but doesn't need to). The M68k on which it was based was more symmetric (with respect to byte/word/long ops) but the ColdFire instructions are 32 bits mainly. Watch out for the differences between Address and Data registers.
Good luck.
Tom