I think what is actually wrong in your diagram is that you are always starting from left side. I am not sure if I am able to explain is understandable way..
Using picture for e200z6 manual, after instruction buffer, there is instruction register (marked) and after that there is decode stage. When there are stalls in Execution units, it means that also instructions in the instruction buffer cannot be decoded when they cannot be put into subsequent stage.
Ideally you could consider exact status of every unit in certain points. Anyway it does not matter where you put stalls in your diagram as it always leads into same performance as a result.
Performance is limited by bottlenecks in pipeline what it is in this case stalls in EU units, influencing the system as a whole.

I am not sure if it helps. Anyway why you need to know such exact details?
Real code cannot be analyzed this way anyway, for this purposes there are tools as performance monitors, debug trace and so.