Sometimes LPC43XX ISP won't work until all power sources cycled (including VBAT)

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Sometimes LPC43XX ISP won't work until all power sources cycled (including VBAT)

2,679 Views
brich
Contributor III

We are now experiencing a strange phenomena with approximately 3 to 5% of units in the field already (around 1K units) which come back to us with completely unresponsive processors.  The device behaves as if it has inadvertently gone into deep sleep and experienced debugger lockout, similar to that of a device that cannot be recovered or attached to using any debug probes as described in other articles regarding debugger lockout.  I have taken measurements with a spectrum analyzer and near-field EM probes to verify that indeed no frequency content behavior exists at all on the device.  No clock frequencies are observed at all on the failed units.  This leads me to believe we have a potential software issue that stomps on clock configuration registers.  While I understand the part about not being able to debug the unit under these circumstances, I do not understand how the built-in bootloader using ISP (in our case over USB0), with the assistance of lpcscrypt to download firmware is not able to recover from this problem.  The only way we can get ISP mode to work again is to completely remove all power sources from the LPC4367, including connections to VBAT.  This is easy to do in repair once the chassis has been opened, since we laid out the design on a DDR2 memory card.  But our customers do not want to have to open the case and reinsert this card on their own.  We also don't want to have to have the units sent back for repair when this happens, as ISP was intended to recover from any situation that rendered the processor useless in the field.

My question is: is there anyone aware of a hardware/software fix that can be made that guarantees that we can still recover the device when it gets into this state that allows ISP to still function?  I am certain ISP was working on all units before they went out the door (because that is how firmware was loaded on to them to begin with, using lpcscrypt), and once I remove the VBAT connection along with all power and then plug it back in, the regular application and ISP mode work again.

Below are the relevant portions of the schematics for the carrier board that receives our SODIMM module that has an LPC4367 on it.  While the pinouts are unfamiliar to you, they map directly to the appropriate pins with the functionality described in the user manual.  Thank you.

 


Reset and ISP (reset button actuated with a ball-point pen or pencil)Reset and ISP (reset button actuated with a ball-point pen or pencil)Boot Mode Selection (AND gates powered from +5VSB, present with +3VSB)Boot Mode Selection (AND gates powered from +5VSB, present with +3VSB)MicroModule (UMOD) Sodimm power supplies showing VBAT circuitryMicroModule (UMOD) Sodimm power supplies showing VBAT circuitry

0 Kudos
7 Replies

2,643 Views
ZhangJennie
NXP TechSupport
NXP TechSupport

Hi

Have your application ever entered low power mode?

Thanks,

Jun Zhang

0 Kudos

2,637 Views
brich
Contributor III

Not intentionally.  It could be doing that because of a memory leak in firmware though.  It is impossible to debug since I cannot connect the debugger under these circumstances.  What is especially troubling is that ISP won't work at all until all power sources are removed and restored.  I suppose I could enable the wakeup pins and tug on them with the Reset switch hardware somehow but that would require hardware mods to all units in the field, which is not practical.  And I won't know if anything fixes it until a substantial amount of users in the field do a firmware update which could take years.  I'm really at a loss over this one.  Thank you for your reply though.  Anything else you can think of could help.  Thanks.

0 Kudos

2,624 Views
ZhangJennie
NXP TechSupport
NXP TechSupport

I need to know if the problem was cause by a specific application code ?

Does a new board also have this issue?

 

0 Kudos

2,611 Views
brich
Contributor III

There is no way to know for sure that this is caused by our application code because I cannot debug it.  The code-base is decades old and likely has lingering bugs in it still that we are continually fixing and issuing updates for.  However, once the device gets into this state, it cannot be recovered with ISP unless all power is removed.  This is something we had not anticipated based on the documentation provided by NXP.

We have had several new units returned to us, and several old ones as well.  However we have never seen this issue on a brand new board in production at our facility.  The problem does not seem to be limited to a certain customer, a certain age of the product, it seemingly occurs at random.  We have heard a few customers say they turn the product off on Friday, and it is in this state when they return again on Monday.  We've tried to recreate this scenario but cannot.  

In any case, can you think of a reason why ISP does not work under these circumstances?  It was our understanding that ISP is supposed to be able to completely recover a failed unit in the field.  We designed a "reset" switch similar to what you'd find on the back of a piece of network gear, and anticipated using it this way in case the product ever got into this bricked state.  It was our impression that ISP provided this mechanism.  Especially since it invokes a ROM bootloader from a hardware reset.  There must be something else that the ISP circuitry is relying on in the silicon for this to be the case that is not described in any user documentation provided.

0 Kudos

2,596 Views
ZhangJennie
NXP TechSupport
NXP TechSupport


ISP USB0 mode is enabled during reset. Normally no need power on target.
Did you measure the boot pins and reset pins during entering ISP mode?
Please measure the power consumption if LPC is in low power mode when this problem happens.
There is errata about RESET.2. Please refer it:
https://www.nxp.com.cn/docs/en/errata/ES_LPC436X_FLASH.pdf

0 Kudos

2,585 Views
brich
Contributor III

The boot pins all measure what they should be to get the device to boot into ISP over USB0, in preparation for lpcscrypt to download code.  It is not easy to measure power consumption of the device.  We do not have a way to accurately measure this.  However I can measure frequency content with a near-field probe hooked to a spectrum analyzer.  When the device is in the described state, absolutely no clocks are running whatsoever.  If I had the ability to measure power I would expect almost zero under these circumstances.  So yes, I believe the device has entered low power state somehow, though not by design.  By application bug.  However, not being able to recover from this issue in the field is still not our fault.

The errata sheet confirms my suspicion.  I had always suspected the problem was in the silicon from the start.  It appears indeed the USB0 peripheral's reset state is in an unknown state.  I have also observed a side effect:  when plugged into a PC, the PC reports the last USB device has malfunctioned.  This means the USB low-level enumeration assisted by the on board ROM itself has not happened.  It seems the only way this could happen is what is described in the errata.  Even if our application is in deep sleep with its clock stopped due to some bug in the application code, a reasonable developer would assume the on board ROM outta be able to wake the part up and load fresh firmware, under any circumstance, even this one.  Because the mechanism is initiated by a hardware reset into a ROM bootloader.  That's what it's there for!  No one's application is completely free from bugs.  That's why there is such an importance placed on hardware recovery in the field.  We thought we found a solution to this problem in the LPC43XX, but it appears this feature was not ready for prime time.  Another fail for the LPC43XX series.  Add to that the frustration of not even being able to buy them right now.  Texas Instruments seems to have quite the availability though, seeing as how they own their own fabs.  Hmm...

This is very unfortunate and discouraging in terms of what the product claims to be able to do, and the mess it has caused us on our side.  We designed a seemingly fail-safe back door into our product to prevent this exact scenario, however because of an error in the silicon of NXP (unknown to us at the time of product release), nearly 1K units in the field may all need to be returned to our facility eventually for what amounts now to a necessary hardware upgrade.

0 Kudos

2,566 Views
ZhangJennie
NXP TechSupport
NXP TechSupport

Yes, this should be a errata addressed issue.

0 Kudos