coldfire V1 MCF51JM128 speed problems

Hi I am running the MCF51JM128 ic in the Firebird32 dev board, and I am attempting some speed benchmarking. I have the PLL set to 48Mhz, I am using codewarrior V10 and timing using an interrupt with a counter counting microseconds.
I am trying to time how long it takes to toggle a digital IO 10000 times in a for loop and I am getting very slow results i.e. 28000uS (my Arduino running at 16Mhz does it in 6200uS). This is after setting the all counters to volatile and turning on optimisation which did help quite a lot.

Even just with an empty for loop it takes 17000uS to run.
I have done a step by step debug and it seems the program is spending a lot of time in the interrupt handler when it doesn't need to be (not my interrupt service routine).

Has anyone experienced any similar problems or have some ideas of what I am doing wrong?
