To calculate bus cycles: a) executing from internal RAM that allows misaligned word fetches (I hope it so not only on 912D60, but also on 912B32) just calculate those letters, w, P etc. b) executing from internal flash misaligned data word fetch takes 2 bus cycles. Corresponding access letters should be adjusted, c) executing/reading data from external memory take into account bus width, wait states etc.
It may be way simplier to use CodeWarrior simulator. It is possible to adjust Access Details in simulator. Go to Simulator -> Configure... and for every RAM/ROM etc memory resource click on Access Details -> Set Up... . At least it is possible to specify bus width and waitstates, also misaligned access waitstates
Execution is going from the left to the right. CPU has instructions queue, so it is possible some code is fetched after instruction did what it was supposed to do.