Ok, I partially solved this topic:
https://community.nxp.com/t5/Kinetis-Design-Studio-Knowledge/Relocating-Code-and-Data-Using-the-MCUX...
I come from a decennial experience of DSPIC33 (microchip) and at the moment I am studying the cortex M7.
I am really impressed with their performance.
Putting the functions in ram I was able to do several floating point operations (see code) in just 69 clock strokes (systick).
a + = ka;
b- = kb;
c * = kc;
r = a / b + c / a;
arm_sqrt_f32 (a, (float32_t *) & r2);
In practice, working at 240Mhz, these calculations are done in 288nSEC !!