If you have plenty of RAM available, you still could use gcov: simply implement instead of file I/O routines ones which write to RAM, then stop the target and dump the RAM content to the host.
Another solution would be to use a trace probe like the P&E TraceLink (First Steps with the P&E Tracelink | MCU on Eclipse). Such a probe has internal RAM and can collect data directly from the target.
It all depends how much intrusive your analysis can be: if you have a realtime application, you need to invest in hardware. If your application is not hard realtime or can deal with long delays, then an approach with instrumented application (and dump to a slower connection) could be feasible.