LPC4357 EMC long-long rare fail

robertmorrison · ‎09-17-2019

I am using the LPC3457 208 PQFP device on a custom board. I have written a lot of working firmware for it, but recently found a rare failure mode that I was able to trace back to the processor issuing a bad long long (64 bit) write cycle on the EMC bus. Traces are shown below--on the good access trace, you can see two long long writes which the processor automatically breaks into two 32bit accesses. In between are several diagnostic marker accesses (removing the diagnostic stuff has no effect on the failure mode). In the bad access trace, which happens at the same address but only occurs after about 10000 test iterations, the second long-long write occurs, but is preceded by a bogus bad 32 bit write to the same address. Here is the code that the emulator aborts on, note that the hw_signal trigger is off-screen to the right on the bad access trace.

// here is the first long long write
   *(long long *)FPGA_CHAN_DATA_H = data1;    // This is sent on second burst cycle
   wait_short(2);
         *FPGA_MODE = 0x4;     // swap longs, reset param cnt
// detect if first write fails
         if ((*FPGA_NXP_TEST & 0xffffffff00000000) == 0)
         {
            param_cnt = *FPGA_MODE;
            hw_signal();
         }
         *FPGA_MODE = 4;     // swap longs back
   wait_short(2);
// hw_signal();        Uncomment this to show the good trace
   *FPGA_MODE = 0x8;     // swap longs, reset param cnt

// here is the second long long write, it generates a bogus access about once
// every 10000 iterations.

   *(long long *)FPGA_CHAN_DATA_H = data2;    // This is sent on second burst cycle
   wait_short(10);
            param_cnt = *FPGA_MODE;
// detect if bad write occurs
         if ((*FPGA_NXP_TEST & 0xffffffff00000000) == 0)
         {
            hw_signal();
            hw_signal();
            hw_signal();
         }

Has anyone seen problems with EMC long-long accesses like this?

Alexis_A · ‎09-18-2019

Dear Robert,

You mention that the error is present at 10000 more or less, there is an specific memory region where this module fail or this is a random location every time?

Let me know your findings.

Best Regards,

Alexis Andalon

robertmorrison · ‎09-18-2019

Hi Alexis,

One more important data point I forgot to mention--the long-long data

writes and reads are aligned, address bits 2:0 are always zero for the

transfer. I do see the low addresses change as expected while the transfer

is taking place. The bogus 32-bit write goes to the same address as the

first half of the actual 64-bit write.

Robert

On Wed, Sep 18, 2019 at 8:22 PM Robert Morrison <morrisonpiano@gmail.com>

bernhardfink · ‎09-20-2019

Can't think of a systematic issue in the chip design, must be a rare race condition in my opinion.

Are you shure that

LPC_EMC->STATICCONFIG0 = 0x00000088;

is the right setting for the 8-bit wide FPGA access? The two bits which are set here

PM Page mode.
In page mode the EMC can burst up to four external accesses. Therefore devices with asynchronous page mode burst four or higher devices are supported. Asynchronous page mode burst two devices are not supported and must be accessed normally.

0 Disabled (POR reset value).
1 Async page mode enabled (page length four).

PB Byte lane state
The byte lane state bit, PB, enables different types of memory to be connected. For byte-wide static memories the BLSn[3:0] signal from the EMC is usually connected to WE (write enable). In this case for reads all the BLSn[3:0] bits must be HIGH. This means that the byte lane state (PB) bit must be LOW.

16 bit wide static memory devices usually have the BLSn[3:0] signals connected to the UBn and LBn (upper byte and lower byte) signals in the static memory. In this case a write to a particular byte must assert the appropriate UBn or LBn signal LOW. For reads, all the UB and LB signals must be asserted LOW so that the bus is driven. In this case the byte lane state (PB) bit must be HIGH.
Remark: When PB is set to 0, the WE signal is undefined or 0. You must set PB to 1, to use the WE signal.

0 For reads all the bits in BLSn[3:0] are HIGH. For writes the respective active bits in BLSn[3:0] are LOW

(POR reset value).
1 For reads the respective active bits in BLSn[3:0] are LOW. For writes the respective active bits in BLSn[3:0] are LOW.

In my opinion both bits should be 0 and not 1.

Would you be able to show the assembler code for this long-long write?

Regards,

Bernhard.

robertmorrison · ‎09-18-2019

Hi Alexis,

I've gotten quite a bit more data on the failure now,--at this point it

does look like the processor is definitely having problems doing long-long

writes (and possibly reads as well). The address is fixed at 0x1c000xxx,

this is in a bank of 1024 adresses used as a portal into an FPGA. I did

discover a pretty easy (if quite a bit slower) workaround--issuing two long

accesses in sequence with correctly shifted data never causes the failure,

I never see any extra writes with bad data.

The fail frequency is extremely variable even with no change to the

code--note for this test, no interrupts are enabled which should rule out

any interruptions to the test that might cause this fail. Sometimes I will

get an extra access fail after 10 iterations, sometimes it's in the

millions of iterations. The failure seems to be independent of the data

transferred. The 4 layer board power/ground planes are heavily bypassed, I

see no indication of erratic operation for around 30000 lines of code with

extensive testing of all data types. Only the long-long accesses seem to

somehow get confused.

Some more system data: the chip is operating at 200Mhz core frequency

from an external oscillator running at 25Mhz. The EMC bus talks to only

two devices, an SRAM and an FPGA, and I see no evidence of signal integrity

issues--other tests have run for hundreds of billions of iterations without

any failures. The EMC configuration is set as follows

// Speed up the FPGA reads and writes (0x1c000000)

LPC_EMC->CONTROL = 0x00000001; // turn off EMC reset

LPC_EMC->STATICCONFIG0 = 0x00000088; // set for 8 bit bus, enable

nwe, pagemode

LPC_EMC->STATICWAITRD0 = 0x00000004; // UPDATE SET TO 4

LPC_EMC->STATICWAITPAG0 = 0x00000001; // UPDATE SET TO 1

LPC_EMC->STATICWAITWR0 = 0x00000002; // UPDATE SET TO 2

LPC_EMC->STATICWAITTURN0 = 0x00000003; // 3 cycle turnaround

Robert

LPC4357 EMC long-long rare fail

LPC4357 EMC long-long rare fail

LPC43xx