LPC1788 HardFault recovering

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

LPC1788 HardFault recovering

3,601 次查看
aut
Contributor I

Dear All.

I'm working on a project that use LPC1788 and external SDRAM model IS42S32160F-7TLI (and also a touchscreen display 800x480 and two external flash memory IS25LP512M-JLLP). The CPU clock is 120MHz and the EMC clock is 60MHz.

Sometimes, random, when the system works, a reset occurs. Putting a breakpoint in my HardFault_Handler() routine (in which there is a simple __no_operation() instruction) the debugger stops. In this condition, some fault registers values are the following:

  • AFSR = 0x00000000
  • AIRCR= 0xFA050000
  • BFAR = 0x000A64FC
  • CCR = 0x00000200
  • CSFR = 0x00008200
  • HFSR = 0x40000000

The CFSR register shows the bit PRECISERR e BFARVALID set to 1. First question. Is this a communication problem with the external SDRAM? I want to avoid the system reset. Second question. Is there a way to return at the next address that generated the fault? Alternatively, is there a way to return at the entry point of the while(1){...} loop in the main() function?

The external SDRAM settings are the following:

  • config.ChipSize = 512;
  • config.AddrBusWidth = 32;
  • config.AddrMap = 0;
  • config.CSn = 0;
  • config.DataWidth = 16;
  • config.TotalSize = 536870912UL;
  • config.CASLatency = EMC_NS2CLK(20); // CAS
  • config.RASLatency = EMC_NS2CLK(20+20); // RAS (RAS Latency = tRCD + tCAC)
  • config.Active2ActivePeriod = EMC_NS2CLK(63); // tRC
  • config.ActiveBankLatency = EMC_NS2CLK(14); // tRRD
  • config.AutoRefrehPeriod = EMC_NS2CLK(60); // tRFC
  • config.DataIn2ActiveTime = config.CASLatency+2; // tDAL
  • config.DataOut2ActiveTime = config.Active2ActivePeriod; // tAPR
  • config.WriteRecoveryTime = EMC_NS2CLK(14); // tWR, tDPL, tRWL, or tRDL
  • config.ExitSelfRefreshTime = EMC_NS2CLK(70); // tXSR
  • config.LoadModeReg2Active = EMC_NS2CLK(15); // tMRD
  • config.PrechargeCmdPeriod = EMC_NS2CLK(20); // tRP
  • config.ReadConfig = 1; // Command delayed strategy, using EMCCLKDELAY
  • config.RefreshTime = EMC_SDRAM_REFRESH(7); // tREF (refresh time = 64ms /8192 row = 7.8us)
  • config.Active2PreChargeTime = EMC_NS2CLK(42); // tRAS
  • config.SeftRefreshExitTime = EMC_NS2CLK(70);

Many thanks

0 项奖励
回复
16 回复数

3,423 次查看
aut
Contributor I

The picture below shows the current  EMC registers configuration.

aut_0-1616689325414.png

Is it correct?

Thanks

0 项奖励
回复

3,435 次查看
aut
Contributor I

I'd like to have some clarifycation about SDRAM refresh time settings.

I know there are two way to refresh an SDRAM: burst refresh and distribuited refresh. My first question is: does the LPC1788 allow to use one or the other? If yes, what is the better choice and what is the EMC register to set one or the other?

In my initial post, there are the SDRAM timing settings and the SDRAM datasheet (ISSI model IS42S32160F_7TLI). Are the timing correct?

With reference to DYNAMICREFRESH timer, I set it to 7 us because the SDRAM datasheet indicates 64000 us as maximum refresh time of the single row. Since the total rows are 8192, each row has to be refreshed every 64000/8192 = 7.8 us maximum . I chose 7 instead 8. I made some mistakes?

Thanks

0 项奖励
回复

3,583 次查看
frank_m
Senior Contributor III

> Is there a way to return at the next address that generated the fault? Alternatively, is there a way to return at the entry point of the while(1){...} loop in the main() function?

This is exactly not the purpose of a fault exception.

These fault handlers are supposed to deal with unexpected events and critical system fails, and not to ignore bugs. The while (1)  loop as default handler keeps the system in a safe state, preventing possible damage in an unmonitored system.

"Random faults" can also be caused by stack overflows, out-of-bound array accesses, or dangling pointers.

 

0 项奖励
回复

3,566 次查看
aut
Contributor I

Hi Frank,

thanks for your reply.

"This is exactly not the purpose of a fault exception"

I agree with you but waiting to find and fix the real cause of the problem, I'd like to try to avoid the system reset. Currently my system works in a real application where is not "well apprecied" software reboot. 

Stack overflow does not seem to me to be the cause of the problem. When the debugger stops in the HardFault_Handler() breakpoint, the usage stack is 24%. It could instead be something related to the use of pointers which I make an intense use. Most of these pointers are stored in the external SDRAM. My idea is to move all pointers in the internal SRAM. Is this a good thinking?

Regards   

0 项奖励
回复

3,541 次查看
converse
Senior Contributor V

"My idea is to move all pointers in the internal SRAM. Is this a good thinking?"

No. If you have a bad pointer (uninitialised, or overwriting memory, for example) all you'll do is move the problem somewhere else - not fix the problem.

There is no shortcut to this. If you want to resolve the resetting problem, you are going to have to roll your sleeves up and do some serious debugging. Start with finding the PC that causes the exception and work back from there:

- is the fault consistent or random?

- can you set a breakpoint just before the PC and take a look at variables/registers

- is there a pattern to any (possible) corruption - do you recognise any of the data?

 

0 项奖励
回复

3,507 次查看
aut
Contributor I

Hi,

thanks for your reply.

You were rigth. I moved in the internal SRAM all pointers stored in the external SDRAM but nothing changed.  

However, I'm not sure is a software problem. It could be also an hardware problem. I'm not shure about the integrity of the SDRAM signals. As I said in a previous post, the problem occurs almost exclusively in presence of hi power devices (inverter, AC motor)  

Regards

0 项奖励
回复

3,478 次查看
frank_m
Senior Contributor III

> As I said in a previous post, the problem occurs almost exclusively in presence of hi power devices (inverter, AC motor)  

What do you mean with "presence" ?

Are they connected in any way galvanically, or just nearby ?

It might be helpful to watch the power supply with a scope. You could e.g. set a GPIO in the hardfault routine, to trigger/stop the scope.

0 项奖励
回复

3,467 次查看
aut
Contributor I

My electronic board is connected via RS485 wired bus to a Toshiba inverter that powers an asynchronous motor (the motor can have power size from 5 to 15 kW). All devices (my board, inverter and motor) are very close to each other. The reboot occurs (random) almost exclusively when the motor is running but there have been (rare) cases where the reboot has occurred with the motor stopped. If increase the motor power, increase also the reboot occurrence

Regards

0 项奖励
回复

3,456 次查看
frank_m
Senior Contributor III

As said, I would check/observe the power supply. Sounds like you have EMI issues.

>... via RS485 wired bus

Consider galvanic isolation. Possibly ground potential issues, caused by transverse currents.

But as said, I would observe it with a scope, and try to trigger the scope from the MCU error / hardfault. Managers want solid proof of the cause.

0 项奖励
回复

3,440 次查看
aut
Contributor I

Shielding the cables, decrease the HardFault occurrences but I also verified the HardFault occurs (rarely) even if the inverter and motor are in stop condition.

0 项奖励
回复

3,498 次查看
converse
Senior Contributor V

Well, I suppose it could be noise, but only you can determine that. Can you remove potential sources of noise or shield your hardware?

As I said, you need to roll your sleeves up and do some serious debugging. Start by working out if these is any pattern to the problem:

  • is PC that caused the fault at a common address (or range of addresses)
  • Look at the stack and see if there is any obvious corruption 
  • try to set a breakpoint in the fault handler and have a good look around your data structures looking for signs of data corruption.
0 项奖励
回复

3,559 次查看
frank_m
Senior Contributor III

As mentioned, other common issues are out-of-bound accesses to arrays. For auto variables, this thrashes the stack, i.e. return addresses in this case.

Or dangling pointers. This can also be pointers to stack variables. Accessing them beyond their context will have the same result.

Do you know this document ? https://www.keil.com/appnotes/files/apnt209.pdf

You can check the SCB registers for the exact hardfault reasons. For many of the above mentioned causes, the fault register values might not be conclusive. You might need add instrumentalisation code to find the root cause.

0 项奖励
回复

3,587 次查看
xiangjun_rong
NXP TechSupport
NXP TechSupport

Hi, Nico,

Do you put the stack on the SDRAM?  if it is the case, pls put the stack to internal SRAM.

It appears that writing/reading SDRAM has issue, pls reconsider the SDRAM timing.

BR

XiangJun Rong

0 项奖励
回复

3,529 次查看
xiangjun_rong
NXP TechSupport
NXP TechSupport

Hi, Nico,

FYI, from your description, it appears that the stack is not destroyed,  if it is the case, you can check the stack to track the PC value which trigger the hardfault. When the hardfault happens, the 12 core registers are saved into stack, this is the Cortex-M4 register saving automatically.

Hope it can help you

BR

Xiangjun Rong

xiangjun_rong_0-1616469665494.png

 

0 项奖励
回复

3,518 次查看
aut
Contributor I

Hi Xiangjun,

Thanks for your reply.

I'll try to do the check you suggested me even if is not simple to catch the HardFault event using the debugger because it is random and it happens almost exclusively when the system works in a real working environment where the signals are very disturbed due to the presence of high power inverter and asyncronous AC motor.    

Regards

0 项奖励
回复

3,570 次查看
aut
Contributor I

Hi XiangJun,

thanks for your reply.   

No, I put the stack on the internal SRAM (addresses 0x10000000 - 0x1000FFFF ). Attached you can find the modified IAR ".icf" file. 

With reference to the SDRAM timing, in my initial post I put the currently used values. Attached you can find the SDRAM datasheet. Do you think there is something wrong?

Regards  

 

 

0 项奖励
回复