AnsweredAssumed Answered

Flash update sometimes not working

Question asked by Michael Schwager on Nov 2, 2014
Latest reply on Nov 3, 2014 by Michael Schwager

I have 2 custom boards, both with MK60DN512VLQ10 in 144-pin package, running MQX 4.1, and running everything through IAR on Windows 7.  Both boards are running the exact same binary.

I can run the flash swap example correctly on both of them.

However one of the two boards is having trouble.  I've tried this on 6 different boards (3 working variety and 3 non-working variety) and the results are the same.

The problem almost smells like a hardware problem if seen on its own, but not when you see the full picture.


Setup as follows:

A non-main task runs a loop which uses the Processor Expert SPI slave driver, and blocks (_lwsem_wait_ticks(...) )on a semaphore which is posted by the SPI slave driver.  When this driver receives a SPI message from the host, it calls the callback which posts the semaphore and the loop continues.  Some time later within the loop and based on the contents of the SPI message, an attempt is made to write to the Flash.  For debugging I have sprinkled calls to clone_application(...) in several places.


Board 1 (working board):

clone_application() from main_task runs fine

clone_application() from aforementioned non-main task runs fine

my own flash writing code using normal fseek, write, etc., based exactly on the example runs fine.

There is a noticeable and seemingly appropriate delay when erasing the flash via FLASH_IOCTL_ERASE_FILE


Board (non-working board):

clone_application() from main_task runs fine.

clone_application() from aforementioned non-main task runs fine before it calls _lwsem_wait_ticks(...), runs fine even from within the SPI callback (which is in the interrupt frame) before and after posting the semaphore, but fails after _lwsem_wait_ticks(...) returns.


IAR stack usage shows i'm not blowing the stack and there appear to be no other errors.  The clone_application() runs one or two times through its main loop and then fails with the write(...) call returning -1, ie the write command does not write all 64 bytes.  Sometimes it writes more bytes than other times.  I've noticed in this failing case that FLASH_IOCTL_ERASE_FILE actually only erases 2048 bytes (from 0x0040_0000 to 0x0040_08000) and the ioctl(...) call does not have the same delay.  There is a 100% correlation between erase failing and write failing, but the bytes don't really match.  That is, erase only erases 2048 bytes, but write only goes on the order of 8-80 bytes or so before failing, that is, sometimes it'll write one full 64-byte buffer okay, then write something less than 64 bytes and then the write(...) returns -1.  So something is screwing with both the erase and the write at different points.  I have disabled any other interrupts sources, ie external sensors... anyway, the hardware is virtually identical between working and non-working.


The main hardware difference is the non-working board has far fewer GPIO pin connected, ie doesn't have ethernet and a bunch of debug GPIOs connected.


So what could possibly be causing Flash erase to fail and subsequently write to fail?

Why does write(...) return -1 rather than the actual number of bytes written?