Dear All and Timesys
First of all sorry for long message.
Regarding U-boot clock init routine. Perhaps there is some MCU on the Earth, which works well clocked by unstable clock, but as a general rule in order to prevent runaways, MCU clock switching should be done only from stable clock to stable clock. Once PLL is set up and going to stabilize and lock, there must be some delay waiting for PLL lock, switch to PLL clock should happen only after locked. clock_init() in vf610twr.c looks suspicious.
I'm also using Timesys gcc toolchain 4.8.3. I had a hope that your newly compiled u-boot.nand will work on my Tower. Unfortunately or fortunately it doesn't. I tried flashing it twice and nand dump'ing around to verify if it really went to flash and FCB is there at offset 0. Your new U-boot.nand doesn't give any signs of life. But your old u-boot.nand (www.timesys.com/vybrid/u-boot-nand.bin, compile date May 08 2014 - 11:56:21, filesize 253488) works well! Do you know gcc version, working u-boot was compiled with? Perhaps there's also Tower card difference? My Vybrid card is labeled SCH-27442 REV G1 and 700-27422 REV J. U-boot claims Vybrid family revision is 1.1.
Day ago I also tried zero padding u-boot.imx, that came with my most recent copy of Timesys SDK. It doesn't boot as well from nand. I remember some older u-boot.imx padded with zeros was booting fine from nand, but it was just complaining about missing SD card.
I was able to build working image. But something still is wrong. First thing I tried was changing optimization level in config.mk file. Please find there OPTFLAGS = setting. Changing it from -Os to anything else like -O3 or -O2 or -Os -Og led to advice to use -fPIC option:
armv7l-timesys-linux-gnueabi-ld.bfd: arch/arm/cpu/armv7/libarmv7.o: relocation R_ARM_MOVW_ABS_NC against `a local symbol' can not be used when making a shared object; recompile with -fPIC
arch/arm/cpu/armv7/libarmv7.o: error adding symbols: Bad value
make: *** [u-boot] Error 1
So I tried adding -fPIC. Image builds with -fPIC. With original clock_init(), DS-5 debug stops at BKPR instruction somewhere in DDR RAM, some boot time strings are printed to UART. With modified clock_init() it even works, and I'm able to boot kernel and use root FS in NAND. The only problem I see is that target resets when I enter "boot" at U-boots command prompt. Strange thing is that after U-boot resets, also at power on reset, U-boot is able to autoboot... If it matters, bootcmd is set to 'nand read 0x80010000 0x100000 0x300000; bootm 0x80010000'
So what do you think, is it toolchain 4.8.3 issue?
Few words about Linux fsl_nfc.c driver mods. By default you configure it for SW ECC. Since SW ECC over 512 byte subpages used in Linux differs from HW ECC over 2048 physical pages used in U-boot, it is not possible to tftpboot and nand write root FS from U-boot. Both U-boot and Linux should use the same ECC to make working U-boot NAND FS flashing. HW ECC in Linux fsl_nfc.c is broken as you know, and SW ECC for some reason uses different than U-boots ECC and page layout. I didn't find where and how Linux decides to split 2048 physical pages into 512 subpages, but I managed to fix HW ECC in fsl_nfc.c driver. Unfortunately once HW ECC in Linux started to work, I somehow failed to flash root FS from old u-boot.nand for the first time and started claiming U-boot for this. This is why I dived into U-boot's code and why I made you involved in these problems. Sorry. In fact Timesys u-boot.nand dated May 08 2014 flashes everything almost very well. What should be improved in U-boot is - missing one or two extra backup FCB's with at least two U-boot instances for improved NAND boot reliability. nandinit and nandwrite commands initialize only primary FCB at offset 0. FW1 and FW2 in FCB @0 are made to point to the same and single U-boot copy. It should be improved, isn't it?
Now back again to my fsl_nfc.c mods. This thread VF6xxx NFC (NAND) module clocking was big help . Timesys Support 's patch was close, though driver still was just printing ECC error messages and not updating error counters as required, also there was a bunch of false positives. Bill Pringlemeir was close to the truth saying about un-needed if(!(page%0x40)) in fsl_nfc_command(). I believe one, who coded these !(page%0x40) cases perhaps was worrying about Manufacturer marked bad blocks, which are on each 0x40th page (true for flash used in Vybrid card). Maybe real story is different, but those %0x40 have to be removed, I'm surek it is a bug. Once buggy %0x40 code pieces are removed, there still are false ECC errors, which are caused by erased and not yet programmed flash pages. Of course HW ECC fails on this. Once uncorrectable error occurs, driver has to check if page is erased and report error only in case page is empty. See attached fsl_nfc.c with fixes if you are interested.
There's one more problem to notice. Micron, who is manufacturer of NAND flash on my Tower card, say that bad block mark is first spare location in bad block. I tried to verify where in NFC RAM buffer one should check for first spare location. Do you think it should be at offset 2048? I doubt it. MQX has interesting NAND example. Instead of reading whole page with OOB to check if block is marked bad, MQX attempt to read just few bytes from OOB, and write just few bytes to mark block as bad. Bad block mark according to MQX NAND example is at offset 2048+3. I did
1) write 0x12345678 to NFC RAM buffer and programm it to flash with NFC_SECSC register set to 2 (two bytes instead of four), then
2) read bytes back from flash and got 0x1234FFFF.
Machine is low endian, 0x12 is at offset +3. Looks like first spare location is not at offset 2048, but at offset 2048+3. There must be some internal byte swap HW in NFC, which makes high order byte programmed first. I think MQX is right nad Linux with U-Boot are wrong regarding bad block marker position using Vybrid NFC controller.
Even if you don't agree regarding first byte location, reading erased block with HW ECC you should note a gap in ECC bytes locations. I think it is again NFC internal change of byte order. Experiments show also that odd bytelength ECC modes use even number of bytes in NFC RAM buffer. That is 45, 23 and 15 byte ECC modes respectively use 46, 24 and 16 bytes in NFC RAM buffer. Easy to verify again. Erase flash block and try reading it with all possible ECC modes. And if my observations are correct, then Linux driver nand_ecclayout struct for HW ECC is not initialized properly. Due to byte order swap first free spare byte I think should be not at offset 2 (first byte part 2 byte bad block marker), but at an offset 0. With 45/46bytes ECC, there's also two gaps in free spare bytes area , 45 bytes OOB layout should be as follows:
byte 0,1 - free
2,3-bad block mark
4-15 - free
16,17 ECC
18,19 - free
20-63 - ECC.
Regards
Edward