Help with MPC8347 (or MPC8343 MPC8349 possibly MPC8313) freezing while stepping using COP / JTAG

dknews · ‎07-25-2012

I'm having a problem with MPC8347 freezing and getting completely locked while debugging.

I'm using CW and USB Tap but get the same lock-ups with BDI2000, lock-up happens at some very simple assembly instruction like "lwz r1,NNN(r2)". If I set a breakpoint somewhere else or let code run freely - everything works.

Here is what I've checked:

- schematic looks correct (and 1:1 taken from the MPC8349 eval board)

- same schematic and code works on the other product - no issues

- CPU has latest revision

- JTAG works in terms of making stop on breakpoints and looking through the registers and memory

- lock-up happens only while stepping through the code

- JTAG connector was re-soldered directly to the vias at BGA - with the same results

- All JTAG and RESET lines are tested with the scope - and signal looks good

Please refer me to anything that might help.

Below is some of the information:

If I set a breakpoint inside the code of inline function rd_addr() @ 0x2A9C0 (as you could see it performs completely legitimate load into r4 from [r31]+0 – i.e. from the location of dobj at the end of the 128MB memory) and then do one step the CPU will stop working. If I put breakpoints in every line it fails to stop at address 0x2A9D0. If I put a single breakpoint at the address 0x2A9D8 it will do a correct read from address 0xA0082424 – allocated SRAM address from CS2.

I don’t see any exceptions coming from DRAM controller, memory is thoroughly tested and do not cause any issues in the normal execution without debugger. Its only when debugger is steps thru the code or even go breakpoint to breakpoint eventually it stops responding. It is in my best understanding that the processor is also stops running since it no longer reacts to any external interrupts to the core. All three HRESET, PRESET and JRESET remain high with no glitches…

- The value of MSR before entering critical section (the code I debug is within this section) is 0xA0B0, after entering it is 0x2030 since I clear EE and CE bits

- Watchdog is not programmed

- If I move from one statement to another statement by setting and clearing break point it works for a while, however in the same way as with single step instruction CPU might lock up if a single step is executed from the breakpoint. However even when the next breakpoint is set down the code CPU can get into the same mode when debugger can no longer stop it

- It is not an issue with this particular board since every board from the batch behaves the same way; we have another very similar board with 8347 which we designed four years ago and this board also exhibits this behavior but infrequently enough to make debugging impossible

- Reset lines seems to be stable so I don't suspect that CPU suddenly resets from the outside signal.

- all reset lines are pulled up to 3.3V via 4.7k 4. While I'm inside the code it is done in critical section. I read MSR register and write it back with two bits - CE (critical interrupts) and EE (external interrupts) set to zero. I do not touch any flags in this register which are responsible for debugging. There are no divisions or floating point operations which could assert an exception.

-When I look at the scope at TDO/TDI/TMS signals it appears that once the PPC is in this state there is no attempt from the debugger to transmit a command over JTAG, it looks like the pattern which goes between PPC and the debugger when PPC is running

TomE · ‎07-26-2012

It seems to be something related to interrupts or exceptions.

Except the code you're debugging has interrupts disabled. Both EE and CE.

I would next suspect the memory management exceptions. MMU TLB reload is performed by exceptions calling code.

I don't have the E300 core manual here (I have E200 and E500), so I can't check the details, but I'd suggest you read through the core manual and see what controls the MMU exceptions. It may be that you're not meant to disable CE when MSR[IR] and MSR[DR] are set. You've got MSR[ME] cleared too. That may be affecting the debugging.

Was your software written for a different version of the core? The exception handling under normal operations may be compatible between different cores, but the code needed to allow debugging to work may be different.

Can you debug code that has interrupts enabled? Can you debug the early initialisation code before IR and DR are set? Is the problem different with the MMUs on and off? Are you remapping virtual addresses in your code, or is the mapping "flat"?

Is there any difference in CPU initialisation between the products that you can debug and the ones you can't?

Are there options to configure the debugger for a specific core type?

The "CodeWarrior™ USB TAP Users Guide" document states:

"For PPC processors, placing the CPU into debug mode is just another interrupt. For
example: your code is in an interrupt epilogue and has just placed the return address into
SRR0 when a breakpoint occurs. The breakpoint causes the IP for the address of the
breakpoint to be written to SRR0, destroying your original return address. Stepping
through code which accesses SRR0 and SRR1 exhibits the same problem."

That isn't what is happening to you, but it is possible that some of your code isn't saving some of the registers that it should be saving, and that is what is interfering with the debugging.

Check some of these reports (thanks to Google):

http://www.ultsol.com/index.php/support/knowledge-base?view=category&layout=categorylist&task=lists&...

Good luck,

Tom

aivchenko · ‎07-26-2012

Hi, Tom,

I appreciate your answer! We are working on this issue with Dennis who asked the original question. He designed this hardware.

>I would next suspect the memory management exceptions. MMU TLB reload is performed by exceptions calling code.
I use the flat memory model with MMU enabled only to use caches.
Indeed, all freezes I've seen happened on instructions like lwz or stw which imply that this has something to do with the memory access.
The breakpoints are not placed in the exception handlers. They are in the routine _start() even before MMU is initialized and caches enabled. I actually tried both ways with and without caches enabled with the same result.

>Was your software written for a different version of the core? The exception handling under normal operations may be
>compatible between different cores, but the code needed to allow debugging to work may be different.
No, this is original 8347 code. Debugging works on certain boards and does not on the newer design. I'd suspect some JTAG issues except for this part of the schematics was taken from the previous design and JTAG lines are even shorter on the new board.

>Can you debug code that has interrupts enabled?
Tried both ways w/o any difference

>Can you debug the early initialisation code before IR and DR are set?
Sometimes the core freezes even in startup routine by simply executing something like addi r2, r2, _SDA_BASE@l or blr in __init_registers() even with the fact that LR has the right address... The same code would execute fine if I set up breakpoint and run it continuously

>Is the problem different with the MMUs on and off?
No, I haven't found a difference

>Are you remapping virtual addresses in your code, or is the mapping "flat"?
Yes, it is flat. MMU is only used to be able to enable cache

>Is there any difference in CPU initialisation between the products that you can debug and the ones you can't?

I don't believe so. I can see the same RCW in both cases.

The CPUs has the same mask, the same U-Boot code (which is not executed anyway after the reset).

What makes things even more complicated is that I have one board from the older design which exhibits the same debugging issues as the newer one.

>Are there options to configure the debugger for a specific core type?
I tried two debuggers:

USBTAP with CW and BDI-2000 with gdb-remote

The code in both cases is compiled by the GCC which is a part of CW for PA 8.7 (gdb easily understands debug information in the code compliled by CW).

With BDI-2000 I can still access/change registers and communicate via JTAG even after it complains that the core is frozen.

One question which neither Freescale not Abatron/UltSol tech support could answer is:

- how USB TAP or BDI-2000 finds out that the core is frozen? Where should we look - into the JTAG interface issue or some combination of the software and hardware interaction which cause core to freeze?

TomE · ‎07-26-2012

> how USB TAP or BDI-2000 finds out that the core is frozen?

I don't think it can. With "good old fashioned simple CPUs" (think 8 an 16 bitters) the debug hardware had hooks deep into the CPU. These complicated ones can't do that. There are too many pipelines running for the debugger to be able to hook in simply. As the manual I quoted said:

"For PPC processors, placing the CPU into debug mode is just another interrupt."

I think the assumption is it never goes so badly wrong that the CPU loses control. The CPU has multiple critical, exception and debug traps, so it should always be able to catch any error - as long as all the error catcher code works properly, and as long as the memory and stack are OK (or the exception routines use their own stacks, which they should).

I'm comparing some manuals. Te E200z6 (not the one you're using) has a "Hardware Debug Facility". Quoting from the manual:

“Debug Registers,” are also used for external debugging, but exceptions are not generated to running software. Debug events
enabled in the respective DBCR0–DBCR3 registers are recorded in the DBSR regardless of MSR[DE], and no debug interrupts are generated. Instead, the CPU enters debug mode when an enabled event causes a DBSR bit to become set."

When in this hardware Debug Mode, the debugger loads instructions directly into the CPU IR. It can force instructions to be executed without them having to be read from any memory.

That section isn't present in the E300 core manual, so it looks like it doesn't have that hardware support. It only has support for software debugging. In the section "10.1.6 Interrupt Vectors for Debugging" it lists DSI (00300), Trace (00D00) and Instruction Address Breakpoint (01300) interrupts. Make sure those vector areas aren't being used for some other purpose.

More ideas. Check your stack pointer. Make sure it isn't misaligned or running away.Since it isn't a hardware debugger, the software debugger might require memory to be reserved for its variables. Make sure none of your code is writing on the debugger's memory. That's a very common way of losing control of the debugger.

Check the presense of required pullups and pulldowns on all of the JTAG signals. You might have a "floating pin" that works on some boards but not on others.

Check "5.3.2.5 System I/O Configuration Register Low (SICRL)", "5.3.2.6 System I/O Configuration Register High (SICRH)" and "5.3.2.7 Debug Configuration" to make sure nothing has reprogrammed any pins the debugger is relying on.

> the same U-Boot code (which is not executed anyway after the reset).

How does that work? Does U-Boot load code into RAM and then reset in a different mode to directly-execute the loaded code? Are you sure there's none of U-Boots register programming left as-as and not overridden?

Good luck.

Tom

EAlepins · ‎09-04-2012

Hi,

As a note, we also had a similar problem. The Adatron BDI2000/3000 gave us "COP freeze" message as CPU state when we put breakpoints around memory loads. It was on a MPC8349A. I don't remember the details, but I suspected some errors on the JTAG connections of our custom board (however, this was only an assumption). We were able to work around the problem by moving breakpoints to other locations (they were breakpoints put by automated tests). We were compiling our application with WindRiver Diab C Compiler.

We never really understood the problem. But I now see it is probably a hardware bug in the MPC83xx itself since it is neither specific to one compiler, nor debugger, nor application, nor board...

Note: we never had such problem with MPC5554 (e200z6 core).

Étienne