[LPC1853] How boot from ext. memory (SPIFI and SRAM)

取消
显示结果 
显示  仅  | 搜索替代 
您的意思是: 

[LPC1853] How boot from ext. memory (SPIFI and SRAM)

1,268 次查看
mariuszwlodarcz
Contributor I

Hi,

I have problem, which I'm trying to solve.

 

Background:

+ I use Keil uVision with ARM-GCC;
+ I use SPIFI lib to get data from ext. flash (QSPI);
+ I have HW setup :

160255_160255.pngpastedImage_1.png

+ Ext. Flash - icons, sounds, texts; Ext. RAM - frame buffers for TFT;

 

Problem:

+ I have internal flash and ram in my uC, but internal flash (only for code) run out. 

+ I know that LPC18xx has boot ROM feature, but in my scenerio I can't use it. Guy from NXP told me that, code execution from ext flash (via qspi) is to slow to our application. It means, that I should execute my code from external RAM. This code has to come from external flash (via QSPI).

 

My proposal:

+ I need a support for doing this scenario:

      1) program code and data for GUI (icons, sounds) in ext. flash memory + small piece of code to internal flash;
      2) at startup of uC - init QSPI and EMC controller and copy all code from ext. flash to ext. RAM;
      3) start executing from ext. RAM (with some critical things in internal RAM);

+ We should remember that all the time : 

                           Ext. Flash is for icons, sounds, texts;

                           Ext. RAM is for frame buffers for TFT;

 

My questions:

+ How do it properly? 

    I think that I should manually setup linker to do this kinds of things:
      * reserve memory area to execution code in ext. RAM (it is visible in our memory region at 0x2000 0000 address) -          only reserve, because at startup there is no code inside;
      * setup linker to write initial code ("small piece of code") to internal flash (e.g. to bank A);

      * This initial code should init flash an RAM controllers, and copy code from ext. flash to ext. RAM;

      * Start execution from ext. RAM;
+ Do you have any other idea, about what I should do to get my My proposal?

标签 (2)
0 项奖励
回复
3 回复数

915 次查看
bernhardfink
NXP Employee
NXP Employee

Generally I recommend to work with the flash version LPC1857, it is a later silicon version than the flash-less types LPC1850, so quite some bugs in the initial LPC1800 architecture have been fixed there.

However, in principle the LPC1850 would do the job as well: 

  • full binary image is in qSPI
  • start from qSPI
  • relocate (certain) code to external SDRAM
  • execute from SDRAM, fetch display data also from qSPI 

If you use the LPC1857 I would recommend to partition it this way:

  • as much code as possible in internal flash, rest of the code image in qSPI
  • start from internal flash, relocate code from qSPI to external SDRAM
  • execute from internal flash and from external SDRAM, fetch display data also from qSPI 

This of course requires an accurate "floor planning", also with regards to the stack/heap pointer.

There is one interesting thing to consider:  the SPIFI has a small cache. This means that code snippets like smaller for-loops are executed from this cache. If you execute from external RAM, then there is no cache involved. The SDRAM loads always 128 bits, so there are at least 8 x 16bits availabel for the ARM, but that's less than the SPIFI cache.

So executing from SDRAM and managing the buffers on the SDRAM means to share the bus between these two tasks. Sometimes it's faster to execute from SPIFI and manage the buffers on the SDRAM. Hard to say where you could end up, but if you have these memories physically on your board anyway, you could chnage the configuration on the fly and test it out.

Regards,

NXP Support Team

0 项奖励
回复

915 次查看
mariuszwlodarcz
Contributor I

Two points:

1) I have LPC1853 with flash (512kB int flash);

2) I can not switch to LPC1857 (1MB int flash);

If you use the LPC1857 I would recommend to partition it this way:

  • as much code as possible in internal flash, rest of the code image in qSPI
  • start from internal flash, relocate code from qSPI to external SDRAM
  • execute from internal flash and from external SDRAM, fetch display data also from qSPI 

It is similar to my My proposal. 

The question is not about logic algorithm, but how to implement this scenario.

I have mail from NXP :

When you want to execute your program out of SPIFI Flash what we need to consider is the performance that you need.

When executing program out of SPIFI Flash the CPU is able to execute about 5..8Mio Instructions per second, while running the same code out of SDRAM the CPU may execute about 70..90Mio instructions per second (@180MHz).

How would you comment on this Bernhard?

Regards:

Mariusz

0 项奖励
回复

915 次查看
bernhardfink
NXP Employee
NXP Employee

Hi Mariusz,

for the LPC1853 of course the same rule applies:  put as much as possible into the internal flash.

Here is a measurement I did quite some time ago using the Coremark benchmark code. It applies to LPC1853 as well, because the memory subsystems are exactly the same. Max frequency is of course 180MHz for the LPC1800.


Board

MCU

Core

CPU Clock

EMC Clock

Code Execution Memory

Coremark

Coremark

Note

MCB4300

LPC4357

M4

204

102

Internal SRAM

438

2.15

 

MCB4300

LPC4357

M4

204

102

Internal flash

415

2.04

Flash access time = 6

MCB4300

LPC4357

M4

204

102

Internal flash

404

1.98

Flash access time = 9

MCB4300

LPC4357

M4

204

102

qSPI, S25FL032P

232

1.14

Specified for only 80MHz in quad SPI mode, but it works

MCB4300

LPC4357

M4

120

120

32-bit SDRAM, MT48C4M32 -6

98

0.82

 

MCB4300

LPC4357

M4

204

102

32-bit SDRAM, MT48C4M32 -6

84

0.41

 

Initialization functions are executed in internal SRAM

RW data was for all tests was located in internal SRAM

All Coremark related RO code is located in the respective Code Execution Memory

That's the reality, anything else is theory. Performing 70...90 Mio instructions per second might be valid for the following use case: your executable code size is overall 8 x 16 bits, therefore it fits into the buffers of the SDRAM interface. You load that one time and then execute it forever without fetching any new code from the external memory.

Not very realistic, right? Getting something from the SDRAM takes time, you need to address the SDRAM memory, you fetch the code and you maybe fetch data at a different SDRAM memory position. This means that your little 128-bit buffer cache gets invalid immediately. Then there are breaks in the memory bus activities, caused by the AHB bus behavior etc etc.

All this is causing a slow down, in average you are ending up at performance levels shown in the table above. It is indeed possible to write code which performs worse on the SPIFI, same code will execute maybe better on the SDRAM. But the coremark code seems to be a good average.

Using 32-bit SDRAM versus 16-bit is not a quantum leap, as the interface is always fetching 128-bits. Either 4 x 32 or 8 x 16. So you only win a little bit because of less break with the 4 x 32 fetching.

Taking the best of the different worlds is the best solution:

  • execute from internal flash and SRAM
  • use the SPIFI also for linear code execution
  • use SDRAM execution for the remaining part of the code
  • try to avoid the placement of variables in SDRAM, in case you are also executing from SDRAM. In average this will cause worse performance as in above table.
  • buffers and also const data can be in SDRAM, they are normally addressed in a linear way, so the standard 128-bit fetching of the SDRAM interface is an advantage there

Regards,

NXP Support Team

0 项奖励
回复