You should probably read the source of the memcpy() function that you're using to see what it is doing internally. It is very common for these functions to first make small transfers (1 byte at a time) until they get 2 or 4 (or more) byte alignment, and then proceed to use the most efficient transfer mode the CPU can support. For this CPU it might be performing LDM or even LDREXD instructions which will fault if they don't have 4 and 8 byte alignment, irrespective of SCTLR.A. The memcpy() code will make sure it doesn't get that wrong. So the 1-byte offset works because memcpy() is alignined, but the 2-byte offset generates different cycles which are failing on that peripheral.
I've read through the FlexSPI chapter and there's nothing obvious in there to say what's going wrong. That memory space might be cached, and that might be causing problems.
After you get the error, can you examine the FlexSPI error registers? These are detailed in "26.8.2 Overview of Error Flags". They might show what sort of memory cycle the FlexSPI didn't like.
I've read through the "B3.5 Protected Memory System Architecture, PMSAv7" chapter in the "ARM v7-M Architecture Reference Manual" and can't find any reference to alignment.
Technically, memcpy() is a user-space function, and should only be used from user code that doesn't go anywhere near any hardware addresses. It can be used in drivers that are accessing "regular memory". If you have a memory space with special alignment requirements, then a custom memory copy routine (or a wrapper for memcpy() that enforces alignment) can be used for that driver or memory space. I've worked on systems with a custom memcpy-like function that knew all of the different memory spaces, and called different copy functions depending on the source and destination addresses.
I'm not familiar with the evaluation board or SDK code.
Tom