Hi, if you measure 1.2usec with frequency 200MHz, it gives 240 clock cycles i.e. how you said 6.6 average clock cycles.
Of course with this architecture code length is not equal to execution time (such estimation could be maybe used with some simple 8-bit MCUs but not with such complex architecture as Power Arch is with pipelining, pre-fetching, crossbar, cache and so on).
But if you mean the number as average instruction execution time of certain code portion, then it is OK.