Is there a chance to meet the malfunctioning flash memory?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Is there a chance to meet the malfunctioning flash memory?

725 Views
GlebPlekhotko
Contributor II

Hi, everyone.

Recently, I encountered an issue with the product I'm currently working on. The product uses a piece of the flash memory to store the user's data. It is organized as a very primitive file system, where each "file" uses a page (512 bytes). The MCU is LPC5528 running at clock speed is 96 MHz.

At a certain moment a sample just refused to boot up. The review uncovered, that it was caused by the initialization failure, which comprises the creation of certain "files" and their population with data. More specifically, the "FLASH_VerifyProgram" function returned the code 117, which in the user manual is described as "kStatus_FLASH_CompareError" (see "Chapter 9 - Flash API")

The further study pointed to the byte 0x0007B6C2 to be in charge of the issue. Its expected value was 0x00 while in fact it was 0x02. The sequence used to write data is presented below.

 

 

#include <fsl_iap.h>
#include <fsl_iap_ffr.h>

flash_config_t flashInstance;
uint8_t flashBuffer[512];
uint8_t flashCheckBuffer[512];

void flashTest(void)
{
  status_t result;
  uint32_t failedAddress = ~0;
  uint32_t failedData = ~0;

  FLASH_Init(&flashInstance);
  flashInstance.modeConfig.readSingleWord.readWithEccOff = 
    kFLASH_ReadWithEccOff;

  memset(flashBuffer, 0, sizeof flashBuffer);
  memset(flashCheckBuffer, 0, sizeof flashCheckBuffer);

  /* The user's data region occupies the 0x0007B600 - 0x0007FFFF
     region, 18.5 KBytes in total. */
  result = FLASH_Erase(&flashInstance, 0x0007B600, 0x4A00, 
                       kFLASH_ApiEraseKey);
  if (result != kStatus_Success) {
    while(1);
  }
   
  result = FLASH_Program(&flashInstance, 0x0007B600, 
                         flashBuffer, sizeof flashBuffer);
  if (result != kStatus_Success) {
    while(1);
  }

  result = FLASH_VerifyProgram(&flashInstance, 0x0007B600, 
                               sizeof flashBuffer, flashBuffer,
                               &failedAddress, &failedData);
  if (result == kStatus_Success) {
    while(1);
  }

  result = FLASH_Read(&flashInstance, 0x0007B600, 
                      flashCheckBuffer, sizeof flashCheckBuffer);
  if (result != kStatus_Success) {
    while(1);
  }

  for (size_t byte = 0; byte < sizeof flashCheckBuffer; ++byte) {
    if (flashBuffer[byte] != flashCheckBuffer[byte]) {
      // Here the execution stops, because byte 0xC2 is not zero
      while(1);
    }
  }
}

 

 

Above I'm completely erasing the user region and performing a test write to the first page. Despite the fact that the data is just all zeros, the byte 0x0007B6C2 obtains the 0x02 value. If I try to put the 0xA0 value there, the outcome will be 0xA2. It feels like a single bit is corrupted and swapped upon every program attempt.

My little questions:

  • How common are cases like this? Considering a sample encountered no stresses like overvoltage/overheating, etc.
  • Is it a permanent damage or it can be somehow tweaked by software? (Not likely, I know)

If the risk is significant, then more efforts must be put in to guarantee the integrity of the data.

Labels (2)
0 Kudos
Reply
4 Replies

640 Views
GlebPlekhotko
Contributor II

@Alice_Yang, currently it is just a single encounter. But I am just a developer, so there are not many samples in my possession. I'm quite sure this is not a ubiquitous case, but still an interesting observation. Especially because it was actually a "fresh" chip, which did not undergo any significant usage, so the flash had no chances to "wear off". Oh, and it is a "lab" sample, so no harsh environments as well.  

@frank_m, thank you for sharing your experience. With the LPC devices I'm not sure that the "erased" state of the flash memory is "0xFF". Actually, the freshly erased page is read as "0x00". Though, the API reports the ECC error in this case. Usually I just ignore or turn this option off. Maybe these "zeros" are issues by the code managing the flash memory, it is not clear. 

 

Anyway, the goal of my note was to mention, that there is chance to meet this case, so someone same unlucky as me may found a testimony of his idea when surfing the Web in search for recipe to work it out.

 

 

 

0 Kudos
Reply

628 Views
frank_m
Senior Contributor III

> With the LPC devices I'm not sure that the "erased" state of the flash memory is "0xFF". Actually, the freshly erased page is read as "0x00".

I just have personal experience with a small range of LPC devices, and those I know use to have 0xFF in erased state. This would be the "default", resulting from the physical implementation of the cell.
Although I know other devices that return 0x00 in the erased state. One example would be Infineon XC228x MCUs my company uses for some devices. While hardly any vendor reveals details, this would be easily achieved with logical inverters.

> Especially because it was actually a "fresh" chip, which did not undergo any significant usage, so the flash had no chances to "wear off". Oh, and it is a "lab" sample, so no harsh environments as well.  

As mentioned, perhaps it was pre-damaged to an extend that did not become during factory testing (supposing that happens on an individual base ...).

I would try to write that cell during the "normal" flashing process, i.e. not from IAP code.
Perhaps by placing constant data at that location in your code, and flash it with a debug pod.

0 Kudos
Reply

659 Views
frank_m
Senior Contributor III

> The further study pointed to the byte 0x0007B6C2 to be in charge of the issue. Its expected value was 0x00 while in fact it was 0x02.
...
> Despite the fact that the data is just all zeros, the byte 0x0007B6C2 obtains the 0x02 value. If I try to put the 0xA0 value there, the outcome will be 0xA2.

This seems to indicate that one bit in one Flash cell is damaged.
Take into account that the equilibrum state of Flash cells ("unprogrammed") is 1, i.e. 0xFF per byte. It seems the cell / gate cannot hold the programmed charge anymore.
The working principle of Flash is, you tunnel electrons via a higher voltage into a floating gate of a MOSFET, until the MOSFET safely switches through. High-energy radiation (Alpha, UV) can discharge it. The process of erasing/programming slowly destroys the isolation via electro-migration, which often takes millions of cycles... unless the cell was pre-damaged.

I once had a similiar issue with Cortex M3 devices from a different manufacturers. One out of 5 prototypes consistently reported a correctable Flash ECC error after IAP programming. Which means, it was detected and corrected through the integral ECC.
With thousands of such devices in the field now, I never heard of any issue relating to this problem.

669 Views
Alice_Yang
NXP TechSupport
NXP TechSupport

Hello @GlebPlekhotko 

All of your chips have this problem? Or just one?

 

BR

Alice

0 Kudos
Reply