The basic idea is to copy the function in question from Flash to a certain RAM location (usually at startup time), and have all calls to this function linked to this RAM address. This requires support of both the linker, and respective startup code. As a side note, the IAR toolchain has direct support via the __ramfunc qualifier.
> I need to make sure that a particular function (arm_sqrt_f32) present in the libcr_c.a library is mapped to RAM and not to FLASH.
Though I would check if you really gain anything here.
It looks like you are going for performance. While RAM latencies are certainly shorter, this does not necessarily translate into higher performance. An instruction fetch is not bound to architecture size. Many vendors have 64-bit or 128-bit wide I-bus access for their Cortex M devices, which makes up for their greater Flash latencies.
The most common rationale for RAM functions is IAP access in a single-bank Flash device, where you can't erase/program Flash and fetch instructions from it at the same time.