For a general introduction to debugging on Arm Cortex-M chips, I suggest that you read Chapter 15 ('The Debug Architecture') of "The Definitive Guide to the ARM Cortex-M3" by Joseph Yui, but simply:
- step-into can be implemented by placing a breakpoint on the first instruction of the branch target
- step-over by setting a breakpoint on the instruction following the branch instruction
- step return by reading the stack for the function return address and setting the breakpoint there.
Some processors implement a single-step flag that can be used by debuggers to execute a single instruction before trapping back to the debugger (useful for step-into).
The actual implementation will be far more complex than this as (for example) you may step at source-level rather than assembly instruction level - but the principle is sound. This will be similar for any debugger implementation.