Hello Tino,
The result you are getting would appear to be that which is expected.
A CPU cycle is determined by the bus frequency, rather than MCGOUT - the bus frequency will be a sub-multiple. It appears that your bus frequency is probably 16MHz.
To provide the code for a tight software loop used to toggle an output pin, will require many bus cycles for each passage through the loop. In assembly code, the tightest loop would require 11 cycles for execution of each half period, or 22 cycles for a full period. For example, toggling bit-0 of PTA,
LOOP1:
LDA PTAD ; [3]
EOR #$01 ; [2]
STA PTAD ; [3]
BRA LOOP1 ; [3]
Regards,
Mac