Dear Yiping,
as fas as I understand, the CONFIG_MTD_MAP_BANK_WIDTH_n options are related to physical external bus width. Namely LS1043A's IFC has 16-bit data bus for flash chips connection. Usually, the actual flash bank width is specified in device tree as bank-width parameter. The drivers/mtd/chips/cfi_cmdset_0001.c(cfi_cmdset_0002.c) code uses the bank-width information to issue appropriate width of load/store instructions for flash register access. The main purpose of this is if the flash bank consists of more chips connected in parallel, for example status registers must be read from all the chips. This is what drivers/mtd/chips/cfi_cmdset_0001.c(cfi_cmdset_0002.c) code is doing.
For further reference, I summarize what I've learned about IFC's page read access if IFC_CSORn_NOR[PGRD_EN] = 1:
Typical NOR flash (for example MT28EW used in LS1043A-RDB) has page size of 32 bytes. If IFC_CSORn_NOR[PGRD_EN] = 1 we need the multi-beat read transaction on AMBA bus not to be bigger than 32 bytes. Two useful links for ARM documentation:
What AHB-Lite burst lengths are produced by Cortex-M3 and Cortex-M4?
ARM Cortex-A53 ACE transfers
Combining these articles we have:
1) The AMBA burst length invoked from core is related to an instruction issued.
2) The Cortex-A53 cache linefill always loads 64 bytes, so the cache is unusuable with the NOR flash, 64 bytes is more than the NOR's page size (32 bytes).
3) For Cortex-A53 we must (as I understand) configure MMU to map IFC region as Device memory (how?), than the cache is not used.
4) Now we can use for example ldp instruction to load pair of 64-bit general purpose registers. This should lead to 16 bytes AMBA transaction and hence to 16 bytes page-read from the NOR flash.
5) Maybe we can use the ldp instruction in combination with 128-bit NEON registers to issues 32 bytes AMBA transaction which could match NOR's page size.
Not sure, if this reasoning is correct.
Best regards,
Cyril