Some QE128-based device came back for maintenance after having spent about 5 years in the field running 24/7, and I noticed it would no longer save its configuration into its internal Flash. Flash rewrites are not very common with this device since parameters change very rarely, if ever. (The MCU is in a high-power RF area.)
I thought there might be some Flash corruption that affected code execution, so I loaded new firmware but the problem remained. After some BDM debugging, I noticed the following:
The flash could not be erased & re-programmed unless I repeat the erase process twice (regardless if I program in between), with no significant delay in between the trials. So, erase-erase-program or erase-program-erase-program both work. If there is any delay (like a second or more), there is failure. So, I ended up patching the firmware to call the configuration save routine twice, and this bypassed the problem for now.
But, I'd like to know why this happens. I can understand Flash may be starting to fail and it's becoming harder to erase/program (e.g., needing more time for the erase), but the Flash erasing process returns without errors, even when it fails to erase. So, how am I to know the process failed (without actually verifying each and every byte for being $FF)? If there were an error, at least it could keep trying (say, to erase) again and again for a predefined maximum number of times, before declaring the Flash unusable.
By the way, Flash clock (FDIV) is set around the high end, 200KHz. Would a value closer to low end (150KHz) have better results over long time?