LPC1788 HardFault recovering

aut · ‎03-20-2021

Dear All.

I'm working on a project that use LPC1788 and external SDRAM model IS42S32160F-7TLI (and also a touchscreen display 800x480 and two external flash memory IS25LP512M-JLLP). The CPU clock is 120MHz and the EMC clock is 60MHz.

Sometimes, random, when the system works, a reset occurs. Putting a breakpoint in my HardFault_Handler() routine (in which there is a simple __no_operation() instruction) the debugger stops. In this condition, some fault registers values are the following:

AFSR = 0x00000000
AIRCR= 0xFA050000
BFAR = 0x000A64FC
CCR = 0x00000200
CSFR = 0x00008200
HFSR = 0x40000000

The CFSR register shows the bit PRECISERR e BFARVALID set to 1. First question. Is this a communication problem with the external SDRAM? I want to avoid the system reset. Second question. Is there a way to return at the next address that generated the fault? Alternatively, is there a way to return at the entry point of the while(1){...} loop in the main() function?

The external SDRAM settings are the following:

config.ChipSize = 512;
config.AddrBusWidth = 32;
config.AddrMap = 0;
config.CSn = 0;
config.DataWidth = 16;
config.TotalSize = 536870912UL;
config.CASLatency = EMC_NS2CLK(20); // CAS
config.RASLatency = EMC_NS2CLK(20+20); // RAS (RAS Latency = tRCD + tCAC)
config.Active2ActivePeriod = EMC_NS2CLK(63); // tRC
config.ActiveBankLatency = EMC_NS2CLK(14); // tRRD
config.AutoRefrehPeriod = EMC_NS2CLK(60); // tRFC
config.DataIn2ActiveTime = config.CASLatency+2; // tDAL
config.DataOut2ActiveTime = config.Active2ActivePeriod; // tAPR
config.WriteRecoveryTime = EMC_NS2CLK(14); // tWR, tDPL, tRWL, or tRDL
config.ExitSelfRefreshTime = EMC_NS2CLK(70); // tXSR
config.LoadModeReg2Active = EMC_NS2CLK(15); // tMRD
config.PrechargeCmdPeriod = EMC_NS2CLK(20); // tRP
config.ReadConfig = 1; // Command delayed strategy, using EMCCLKDELAY
config.RefreshTime = EMC_SDRAM_REFRESH(7); // tREF (refresh time = 64ms /8192 row = 7.8us)
config.Active2PreChargeTime = EMC_NS2CLK(42); // tRAS
config.SeftRefreshExitTime = EMC_NS2CLK(70);

Many thanks

aut · ‎03-25-2021

The picture below shows the current EMC registers configuration.

Is it correct?

Thanks

aut · ‎03-25-2021

I'd like to have some clarifycation about SDRAM refresh time settings.

I know there are two way to refresh an SDRAM: burst refresh and distribuited refresh. My first question is: does the LPC1788 allow to use one or the other? If yes, what is the better choice and what is the EMC register to set one or the other?

In my initial post, there are the SDRAM timing settings and the SDRAM datasheet (ISSI model IS42S32160F_7TLI). Are the timing correct?

With reference to DYNAMICREFRESH timer, I set it to 7 us because the SDRAM datasheet indicates 64000 us as maximum refresh time of the single row. Since the total rows are 8192, each row has to be refreshed every 64000/8192 = 7.8 us maximum . I chose 7 instead 8. I made some mistakes?

Thanks

frank_m · ‎03-22-2021

> Is there a way to return at the next address that generated the fault? Alternatively, is there a way to return at the entry point of the while(1){...} loop in the main() function?

This is exactly not the purpose of a fault exception.

These fault handlers are supposed to deal with unexpected events and critical system fails, and not to ignore bugs. The while (1) loop as default handler keeps the system in a safe state, preventing possible damage in an unmonitored system.

"Random faults" can also be caused by stack overflows, out-of-bound array accesses, or dangling pointers.

aut · ‎03-22-2021

Hi Frank,

thanks for your reply.

"This is exactly not the purpose of a fault exception"

I agree with you but waiting to find and fix the real cause of the problem, I'd like to try to avoid the system reset. Currently my system works in a real application where is not "well apprecied" software reboot.

Stack overflow does not seem to me to be the cause of the problem. When the debugger stops in the HardFault_Handler() breakpoint, the usage stack is 24%. It could instead be something related to the use of pointers which I make an intense use. Most of these pointers are stored in the external SDRAM. My idea is to move all pointers in the internal SRAM. Is this a good thinking?

Regards

converse · ‎03-22-2021

"My idea is to move all pointers in the internal SRAM. Is this a good thinking?"

No. If you have a bad pointer (uninitialised, or overwriting memory, for example) all you'll do is move the problem somewhere else - not fix the problem.

There is no shortcut to this. If you want to resolve the resetting problem, you are going to have to roll your sleeves up and do some serious debugging. Start with finding the PC that causes the exception and work back from there:

- is the fault consistent or random?

- can you set a breakpoint just before the PC and take a look at variables/registers

- is there a pattern to any (possible) corruption - do you recognise any of the data?

aut · ‎03-23-2021

Hi,

thanks for your reply.

You were rigth. I moved in the internal SRAM all pointers stored in the external SDRAM but nothing changed.

However, I'm not sure is a software problem. It could be also an hardware problem. I'm not shure about the integrity of the SDRAM signals. As I said in a previous post, the problem occurs almost exclusively in presence of hi power devices (inverter, AC motor)

Regards

frank_m · ‎03-24-2021

> As I said in a previous post, the problem occurs almost exclusively in presence of hi power devices (inverter, AC motor)

What do you mean with "presence" ?

Are they connected in any way galvanically, or just nearby ?

It might be helpful to watch the power supply with a scope. You could e.g. set a GPIO in the hardfault routine, to trigger/stop the scope.

aut · ‎03-24-2021

My electronic board is connected via RS485 wired bus to a Toshiba inverter that powers an asynchronous motor (the motor can have power size from 5 to 15 kW). All devices (my board, inverter and motor) are very close to each other. The reboot occurs (random) almost exclusively when the motor is running but there have been (rare) cases where the reboot has occurred with the motor stopped. If increase the motor power, increase also the reboot occurrence

Regards

frank_m · ‎03-24-2021

As said, I would check/observe the power supply. Sounds like you have EMI issues.

>... via RS485 wired bus

Consider galvanic isolation. Possibly ground potential issues, caused by transverse currents.

But as said, I would observe it with a scope, and try to trigger the scope from the MCU error / hardfault. Managers want solid proof of the cause.

aut · ‎03-25-2021

Shielding the cables, decrease the HardFault occurrences but I also verified the HardFault occurs (rarely) even if the inverter and motor are in stop condition.

converse · ‎03-23-2021

Well, I suppose it could be noise, but only you can determine that. Can you remove potential sources of noise or shield your hardware?

As I said, you need to roll your sleeves up and do some serious debugging. Start by working out if these is any pattern to the problem:

is PC that caused the fault at a common address (or range of addresses)
Look at the stack and see if there is any obvious corruption
try to set a breakpoint in the fault handler and have a good look around your data structures looking for signs of data corruption.

frank_m · ‎03-22-2021

As mentioned, other common issues are out-of-bound accesses to arrays. For auto variables, this thrashes the stack, i.e. return addresses in this case.

Or dangling pointers. This can also be pointers to stack variables. Accessing them beyond their context will have the same result.

Do you know this document ? https://www.keil.com/appnotes/files/apnt209.pdf

You can check the SCB registers for the exact hardfault reasons. For many of the above mentioned causes, the fault register values might not be conclusive. You might need add instrumentalisation code to find the root cause.

xiangjun_rong · ‎03-21-2021

Hi, Nico,

Do you put the stack on the SDRAM? if it is the case, pls put the stack to internal SRAM.

It appears that writing/reading SDRAM has issue, pls reconsider the SDRAM timing.

BR

XiangJun Rong

xiangjun_rong · ‎03-22-2021

Hi, Nico,

FYI, from your description, it appears that the stack is not destroyed, if it is the case, you can check the stack to track the PC value which trigger the hardfault. When the hardfault happens, the 12 core registers are saved into stack, this is the Cortex-M4 register saving automatically.

Hope it can help you

BR

Xiangjun Rong

aut · ‎03-23-2021

Hi Xiangjun,

Thanks for your reply.

I'll try to do the check you suggested me even if is not simple to catch the HardFault event using the debugger because it is random and it happens almost exclusively when the system works in a real working environment where the signals are very disturbed due to the presence of high power inverter and asyncronous AC motor.

Regards

aut · ‎03-22-2021

Hi XiangJun,

thanks for your reply.

No, I put the stack on the internal SRAM (addresses 0x10000000 - 0x1000FFFF ). Attached you can find the modified IAR ".icf" file.

With reference to the SDRAM timing, in my initial post I put the currently used values. Attached you can find the SDRAM datasheet. Do you think there is something wrong?

Regards