CAVEAT: I haven't verified all of what I'm about to write so...
OK, here goes:
From what you wrote before, U-Boot seems to have been programmed into the Intel Flash (flash 1).
SW1-3 goes to the CPLD, which simply "swaps" the connection of the MCU CS0 and CS1 lines to the two flash devices. When SW1-3 = ON (Default), then CS0 => Flash 0 (smaller Atmel Flash), and CS1 => Flash 1 (bigger Intel Flash). When SW1-3 = OFF (which works for you), these are swapped, and CS0 => Flash 1.
The MCU (on reset) boots from CS0, and loads its reset vector from 0x4 (OK, you knew that already, sorry!).
I believe that U-Boot "copies" itself into SDRAM and executes there, rather than from flash. The SDRAM is mapped to 0x4000_0000. I don't know this for sure, but this is usually what happens.
The BSP manual states that U-Boot maps CS0 to 0x0400_0000. The upper address lines don't go to the (512k) Atmel flash anyway, so this looks like 0x0 to the flash. If CS0 is connected to Flash 1 (Intel), same thing - 0x0400_0000 looks to the flash like 0x0. But, after booting, U-Boot maps CS0 to 0x0400_0000 and CS1 to 0x0 (from the comments in 3.2 of the BSP guide). That is valid for the default SW1-3 setting. If it's swapped, then the CS0/CS1 address mappings are the same, but they'd go to the opposite flash devices, unless U-Boot is smart enough to read the CPLD switch register to detect this change. U-Boot must at least be smart enough to know what type of flash it's talking to (for the cp.b command to work), since they require different command sets for programming.
Bottom line: Since you have CF Flasher working, why not set SW1-3 ON (normal CS mapping), and program the flash (at 0x0400_0000) with the U-Boot image (which shows as 0xFF now)? Then, BOTH flash devices should have a U-Boot image at 0x0. Hopefully then U-Boot will run with the default switch settings. This should get you back to the default configuration. Then you can use the BSP instructions to program flash 1 with the kernel image.