Hi David,
The failing MCUs have different revisions (QTH1608K, QRU1606C, QTH1608L, QRZ1607L, ...), but there seems no general relation between the revisions and the problem.
I also checked the unique serial number that's included in the Test Block of the flash (at address 0x403C10), but I cannot find a relation there either (although I didn't find any description on how to interpret these bits):
Wthout error:
1C12451E C0000034 0044AC4C 00000000
1C12451E C0000034 00405C1C 00000000
With error:
1C12451E C0000034 00449850 00000000
1C12451E C0000034 00485428 00000000
I also did a couple of other checks:
- DMA is disabled
- Frequency of external 24MHz crystal is OK
- System PLL and secondary PLL are OK
- RAM check OK
- Verified Code Flash
- Blank check before programming
Then I encountered a strange behavior. When I cool down the MCU with cooling spray (freeze), the application is running fine, and when I let it rewarm again or use a hot air gun to heat up the MCU (ca.50°C) the exception is triggered.
I have heard of similar issues, corrupting the on-chip flash memory (NVM) through temperature depending bit-flips. Those issues where caused by to high clock speeds when programming the MCU or because of out of range flash core voltage supplies.
But I can't resolve the issue on the failing MCUs even when reprogramming them with low speeds and correct power supply (internal voltage regulator is used to generate the 1.2V core&flash voltage out of the 3.3V main power supply).
So I start to think this might be more a hardware issue.
Regards