The following assumption recommends the user to count the number of cases of the correction of the single bit error. Do you think why it is necessary for the safety perspective? And if you have any analysis result explaining that the user shall count and take the required actions?
Hello Mr Park,
Some failure mode in the memory array may results in critical failure that create multiple bit error. The algorithm for ECC correct single bit error and detect multi bit error.
Now if you imagine due a critical failure (let's say all the bit are shifted by x number in the memory array due to a column muxing error) it is almost equivalent to having a random word written into memory. It is possible that his random word is correct from an ECC point of view, or is just 1 bit away for being correct (some analysis demonstrate 28% of the time) -> however it should be reporting an error as it is not a real value.
In this scenario the amount of Single bit correction will be detected at a rate really higher than the normal noise level that is assume in a normal environment.
So if you look at our transient base failure in our FMEDA, for a overall memory size, for a duration of time you should have the order of magnitude of SER event acceptable (usually pretty low). If now you start reading and you have a critical failure, in theory 28% of all the read would lead to a SEC -> this is a critical failure and should lead to some system remediation.
Different customer implement different threshold based on their own experience so I can't give our an absolute value but 100-1000times the normal rate would be abnormal.
I hope it helps
Antoine Dubois