I find the root cause is inside startup code. If change Optimization level to Os in the Startup_Code folder, the code could run as expected.

If Change to O3, the problem will happen after vldr d7, [r1] instruction inside startup.c/init_data_bss function. This is corresponding to the c code ram[j] = rom[j]; where is to copy rom data from __INIT_INTERRUPT_START address 0x00400800 to __RAM_INTERRUPT_START DTCM address 0x20000000. If using Os settup, the assembly is using different instruction ldr.w r7, [r1, r4, lsl #2] and str.w r7, [r0, r4, lsl #2].
Wondering If you have any better explanation or workaround about failure case if using O3 optimization.
