How much are you doing in your interrupt service routine?
Try this. Write the ISR so that it does NOTHING but set a flag and maybe save the internal timer counter word to a memory location and get out. Set the memory so that it increments a word counter on where to put the data. Once your buffer fills, stop and report the data. But do NOTHING IN THE ISR except SAVE THE DATA. No calcs, no responses, nothing but save the data if possible and get out.
THEN see how fast you can check.
ISR should be down in the sub-microsecond range, but you can't check for something new if your still looking at the something old.
I had an application where I was timing down to the 10uSec level, and found that when I used mechanical switches on a test box to input signals instead of the electrical interface I normally would use I was getting triggers on all the 'bounces' of the switches, and catching thousands of input triggers. And at that time, the 'first' arrival of the signal kept getting updated by the later arrivals, so in essence when I ran the test box I was not timing from button press, I was timing from the last 'bounce' before it stabilized. Once I change my ISR to account for that, I was dead nuts on with any input...
Response to the interrupt can be limited by other things going on in the system though. I tried to do too much in my interrupt service routines originally and found that I could time down to .0001sec reliably MOST of the time, but every now and then I found that I was getting 'windows' of almost 3msec of delay. Then I redid my ISRs so that all they did was snapshot and set a flag and get out. (Remember, it has to be not only the ISR that your interested that has to get in and out fast, it's ANY ISR that may trigger during the window that the ISR you're interested in may trigger that has to be fast!) I moved ALL processing of ISR info to a foreground task that was repeatedly called from a loop. Even serial com stuff. Got all my Rx and Tx data into ISRs, and made them as 'tight' as possible. As a result, I now time multiple events over 16 external interrupt pins (however, only 'any 2' can occur at any one time) and I am dead nuts on to less than .0001sec. Dead nuts enough that I can 'semi-reliably' take the internal TC count and use it to interpolate the time between .0001 ticks to give a reasonably accurate time down to .00001sec.
The hardware is fast. But you may have to look there to understand what is going on.
Mike