I suggest that you take a look at the way that MCUXpresso does this.
The linker, is used to create a table that is then read by the startup code to initialise the data in the various RAM blocks. They key is that data has a load-address (effectively, where it is stored in the flash) and an executable address (where the data needs to be a runtime). It looks like this:
__data_section_table = .;
LONG(LOADADDR(.data));
LONG( ADDR(.data));
LONG( SIZEOF(.data));
LONG(LOADADDR(.data_RAM2));
LONG( ADDR(.data_RAM2));
LONG( SIZEOF(.data_RAM2));
__data_section_table_end = .;
The startup code loops through the table, initialising the data
unsigned int *SectionTableAddr;
// Load base address of Global Section Table
SectionTableAddr = &__data_section_table;
// Copy the data sections from flash to SRAM.
while (SectionTableAddr < &__data_section_table_end) {
LoadAddr = *SectionTableAddr++;
ExeAddr = *SectionTableAddr++;
SectionLen = *SectionTableAddr++;
data_init(LoadAddr, ExeAddr, SectionLen);
}
(where data_init, is effectively just a memcpy).