To stop execution, set a breakpoint on the BL instruction. An alternative may be to use a WFI instruction, which will 'sleep' the processor (until an interrupt).
However, I think what you are trying to do is a waste of time, unless you have some very very expensive measuring equipment - assuming your core is running at a sensible speed (25MHz, 100MHz?) then the 4 instructions you are trying to measure will execute so quickly, you will will not see anything measurable (at 100MHz, 4 instructions will take 40nS.