68HC11 not starting correctly

davidatkins1 · ‎02-14-2017

Hello everyone,

I have a product dating from the late 1990s using the 68HC11E0CFNE2 processor that has started to experience a very high drop-out rate in production. Swapping the processor for an identical device from another batch fixes the problem.

The processor is used in expanded mode (external EPROM and RAM). I’ve put a logic analyser on the bus and soon after reset I can see the processor put out the 0xFFFE and 0xFFFF reset vector and the EPROM responds with the correct address (0xFECD) but the processor jumps to 0xE000 and starts executing the code there.

Probing around with a scope shows no obvious problems – signal quality and timing look fine (setup & hold of the reset vector wrt ‘E’ falling are 250ns and 70ns respectively).

The problem parts have MC68HC11E0CFNE2, 0M28Z, QQKZ1610A written on the package. Swapping those for parts with QQGP1342, QQHY1423 & QQJU1440D in the last line fixes the problem.

Any help would be very much appreciated!

Regards,

David.

tonyp · ‎02-16-2017

As an aside, putting a JMP $FFFE is incorrect as $FFFE does not (should not) contain an opcode but the address to one. The correct sequence to go where the reset vector points is LDX $FFFE followed by JMP ,x

Regardless though, based on the fact that the sequence continues to E003, it shows that most likely your CONFIG register has been set to enable internal ROM, and your external memory changes are completely ignored. It seems that some internal application (or garbage) is running.

BTW, you don't need any special equipment to deal with the CONFIG register. Simply put the board into Special Test Mode via MODA/MODB pins, and use some free app like JBug11 (for Windows) to read and possibly re-program the CONFIG register. If your CONFIG value is inside your S19 file, simply program it over whatever value is there, and try again to see if it works. (My guess is it will.)

Note: You do not need to put the board into Special Test Mode to program CONFIG, only to read it.

View solution in original post

davidatkins1 · ‎03-02-2017

You were absolutely correct, Tony – the CONFIG register was corrupt! The processors what were ignoring the reset vector had the ROMON and EEON bits both incorrectly set but I also found some boards that were stuck in a reset loop and those had the NOCOP bit incorrectly cleared, enabling the watchdog timer.

Do you have any idea how this corruption could occur? The software guys tell me they don’t touch the CONFIG register or the BPROT register that protects it. I guess it’s feasible that the chips we’re buying have this register set incorrectly at the factory. Has this sort of thing occurred before?

Thanks again Tony; your help has been very much appreciated!

tonyp · ‎03-02-2017

Glad you got it solved!

(It's good practice to always program CONFIG to your expected value to avoid such problems.)

How could this have happened? One possibility:

It used to be the case (not sure if it still is) that some factory masked E9 parts (for example) with firmware for some other client were re-purposed for one reason or another -- perhaps because the firmware was found to be buggy and the customer did not want these parts anymore but ones with the corrected firmware version. Instead of throwing away the parts, they are simply renamed to E0 (i.e., no ROM officially) and sold to the general public as ROM-less. As ROM contents are garbage and the E0 part is not supposed to contain any specific ROM image for unused memory, there is no problem.

Regarding the CONFIG register, my guess is since the part actually has a ROM, the CONFIG was set to start from there for the original use case. And, apparently, it was forgotten in that state before 're-branding' and shipping as E0.

A similar thing happened with certain Flash based MCUs. If, for example, a 32KB MCU failed factory stress tests in the lower 16KB, the part was renamed to 16KB and the customer got some bonus memory that, however, is not guaranteed to work correctly based on published specifications.

davidatkins1 · ‎02-16-2017

Hi again Tony,

Thanks for getting back to me. To answer your questions:

I guess it’s possible that the CONFIG register is programmed incorrectly on the bad processors. Unfortunately, this is an old product and we don’t have a development system anymore so I can’t easily read the contents of this register.

The vectors for the COP, clock monitor, interrupts, timers, overflow etc are all initialised & correct and none point to $E000.

The jump to $E000 occurs immediately after the reset vector to $FECD is received (see attached plot).

We’ve tried a faulty processor in two different products and it doesn’t work in either of those. This is not completely conclusive as the original product and these other two share similar hardware and software designs. I’m trying to locate an official Motorola/Freescale evaluation board to try the processor in there, but this may take a while.

I’ve manually edited the hex file to add an extended-mode JMP $FFFE instruction to location $E000 and the processor doesn't seem to respond to it at all – see attached scope shot. It shows (from left to right)…

- Reset rising after power-up (yellow trace)

- The processor puts out the reset vector of $FFFE, PROM responds with $FE

- The processor puts out the reset vector of $FFFF, PROM responds with $CD

- The processor jumps to $E000 where it finds the JMP $FFFE instruction ($7E, $FF, $FE)

- The processor continues to addresses $E003 and $E004, completely ignoring the JMP.

- The program counter then inexplicably goes to $FFFF!

BUS1 consists of the upper half of the address bus in the high-order byte and multiplexed address & data bus in the lower byte.

tonyp · ‎02-16-2017

As an aside, putting a JMP $FFFE is incorrect as $FFFE does not (should not) contain an opcode but the address to one. The correct sequence to go where the reset vector points is LDX $FFFE followed by JMP ,x

Regardless though, based on the fact that the sequence continues to E003, it shows that most likely your CONFIG register has been set to enable internal ROM, and your external memory changes are completely ignored. It seems that some internal application (or garbage) is running.

BTW, you don't need any special equipment to deal with the CONFIG register. Simply put the board into Special Test Mode via MODA/MODB pins, and use some free app like JBug11 (for Windows) to read and possibly re-program the CONFIG register. If your CONFIG value is inside your S19 file, simply program it over whatever value is there, and try again to see if it works. (My guess is it will.)

Note: You do not need to put the board into Special Test Mode to program CONFIG, only to read it.

tonyp · ‎02-14-2017

* Is the failure after (a) having worked for some time, or (b) right away?

[If (a) how are you sure the replacement is OK unless equal time has passed?]

* Are you overclocking?

* Are you using the chip in expanded or in special test mode?

* Is the current draw of the failed chip the same as the working one?

davidatkins1 · ‎02-15-2017

Thanks for getting back to me. To answer your questions…

1, The failure occurs immediately at power-up and is 100% reproducible – it happens every time. I’ve tried freezer spray and a heat gun on the processor and associated circuitry without any success.

2, The processor is a 2MHz part (so maximum of 8MHz crystal allowed) and we use a 7.3728MHz crystal. I’ve tried replacing the crystal with a signal generator and reducing the frequency down to 1MHz but the processor still won’t start.

3, The processor is used in normal expanded mode (so MODA and MODB are both high) with the reset vector address located in external EPROM. I’ve tried grounding the MODB pin to start the processor in special test mode, which has a different reset vector (0xBFFE & 0xBFFF). In this case, I see the EPROM put out the contents of these addresses and the processor then correctly jumps to this new location.

4, I’m not able to easily isolate the current to just the processor but the current taken by the whole circuit is broadly the same for a working and non-working chip (within a few mA).

tonyp · ‎02-15-2017

Things I would try next:

* Is it possible the faulty ones have a wrongly programmed CONFIG register (specifically any of the ROMON, NOCOP, and EEON bits) either by default or by accident? Can you make sure both types have the exact same CONFIG contents?

* Since the faulty ones jump to $E000 can you check if any of the vectors are pointing that way (e.g., COP or NMI)?

(Is the jump to $E000 direct or after several instructions?)

(BTW, are all vectors initialized to something? Ideally, for debugging purposes, all unused vectors should point to slightly different addresses containing something like BRA * to more easily catch which one fires, if any.)

* Can you try the faulty chip with a different application (preferably both hardware and software)? Does it misbehave again?

* Assuming you have access to the source code of the application, can you put a

LDX $FFFE

JMP ,X

at $E000 and see if you bypass the problem or keep going back to $E000 somehow.

68HC11 not starting correctly

68HC11 not starting correctly

General