Well found.
That will get more complicated when you try to program the chips.
I would suggest you don't write your final code with "magic numbers multiplied by 2" as you have done, but try to abstract the chip interface from the arrangement on the board.
According to the data sheets, the chips are 16 bit data addressed as 16 bits with specific offsets (0xaaaa, 0x5555) for specific functions. Your board changes this, having one chip on even-by-two and one chip on odd-by-two addresses with the ability to read (and possibly write) 32 bits at a time.
So it would make the code clearer if you wrote a function that you pass the address (as per the data sheet) and chip number and have it convert to the proper address and an upper-word read or write as appropriate. This could easily be a macro. This would allow access to the two chips as separate devices. This would be OK if you are using the chips as a file system and not trying to run code from it.
But you probably have to boot from the FLASH, so you want a way to read the data 32-bits wide, and probably write it the same way.
For programming you have the option of erasing and programming each chip separately or writing code to program them at the same time. This is a bit tricky as you have to read the status from both chips and wait until the LAST one finishes, and also handle any errors in either or both. It may be simpler to program them one-at-a-time if you're not time-constrained (as this will take twice as long).
???? How much faster did your test code run with the cache on?
Tom