[LPC1853] How boot from ext. memory (SPIFI and SRAM)

mariuszwlodarcz · ‎08-30-2016

Hi,

I have problem, which I'm trying to solve.

Background:

+ I use Keil uVision with ARM-GCC;
+ I use SPIFI lib to get data from ext. flash (QSPI);
+ I have HW setup :

+ Ext. Flash - icons, sounds, texts; Ext. RAM - frame buffers for TFT;

Problem:

+ I have internal flash and ram in my uC, but internal flash (only for code) run out.

+ I know that LPC18xx has boot ROM feature, but in my scenerio I can't use it. Guy from NXP told me that, code execution from ext flash (via qspi) is to slow to our application. It means, that I should execute my code from external RAM. This code has to come from external flash (via QSPI).

My proposal:

+ I need a support for doing this scenario:

      1) program code and data for GUI (icons, sounds) in ext. flash memory + small piece of code to internal flash;
      2) at startup of uC - init QSPI and EMC controller and copy all code from ext. flash to ext. RAM;
      3) start executing from ext. RAM (with some critical things in internal RAM);

+ We should remember that all the time :

Ext. Flash is for icons, sounds, texts;

Ext. RAM is for frame buffers for TFT;

My questions:

+ How do it properly?

    I think that I should manually setup linker to do this kinds of things:
      * reserve memory area to execution code in ext. RAM (it is visible in our memory region at 0x2000 0000 address) -          only reserve, because at startup there is no code inside;
      * setup linker to write initial code ("small piece of code") to internal flash (e.g. to bank A);

* This initial code should init flash an RAM controllers, and copy code from ext. flash to ext. RAM;

* Start execution from ext. RAM;
+ Do you have any other idea, about what I should do to get my My proposal?

bernhardfink · ‎08-30-2016

Generally I recommend to work with the flash version LPC1857, it is a later silicon version than the flash-less types LPC1850, so quite some bugs in the initial LPC1800 architecture have been fixed there.

However, in principle the LPC1850 would do the job as well:

full binary image is in qSPI
start from qSPI
relocate (certain) code to external SDRAM
execute from SDRAM, fetch display data also from qSPI

If you use the LPC1857 I would recommend to partition it this way:

as much code as possible in internal flash, rest of the code image in qSPI
start from internal flash, relocate code from qSPI to external SDRAM
execute from internal flash and from external SDRAM, fetch display data also from qSPI

This of course requires an accurate "floor planning", also with regards to the stack/heap pointer.

There is one interesting thing to consider: the SPIFI has a small cache. This means that code snippets like smaller for-loops are executed from this cache. If you execute from external RAM, then there is no cache involved. The SDRAM loads always 128 bits, so there are at least 8 x 16bits availabel for the ARM, but that's less than the SPIFI cache.

So executing from SDRAM and managing the buffers on the SDRAM means to share the bus between these two tasks. Sometimes it's faster to execute from SPIFI and manage the buffers on the SDRAM. Hard to say where you could end up, but if you have these memories physically on your board anyway, you could chnage the configuration on the fly and test it out.

Regards,

NXP Support Team

mariuszwlodarcz · ‎08-30-2016

Two points:

1) I have LPC1853 with flash (512kB int flash);

2) I can not switch to LPC1857 (1MB int flash);

If you use the LPC1857 I would recommend to partition it this way:
as much code as possible in internal flash, rest of the code image in qSPI
start from internal flash, relocate code from qSPI to external SDRAM
execute from internal flash and from external SDRAM, fetch display data also from qSPI

It is similar to my My proposal.

The question is not about logic algorithm, but how to implement this scenario.

I have mail from NXP :

When you want to execute your program out of SPIFI Flash what we need to consider is the performance that you need.
When executing program out of SPIFI Flash the CPU is able to execute about 5..8Mio Instructions per second, while running the same code out of SDRAM the CPU may execute about 70..90Mio instructions per second (@180MHz).

How would you comment on this Bernhard?

Regards:

Mariusz

bernhardfink · ‎08-30-2016

Hi Mariusz,

for the LPC1853 of course the same rule applies: put as much as possible into the internal flash.

Here is a measurement I did quite some time ago using the Coremark benchmark code. It applies to LPC1853 as well, because the memory subsystems are exactly the same. Max frequency is of course 180MHz for the LPC1800.

Board	MCU	Core	CPU Clock	EMC Clock	Code Execution Memory	Coremark	Coremark	Note
MCB4300	LPC4357	M4	204	102	Internal SRAM	438	2.15
MCB4300	LPC4357	M4	204	102	Internal flash	415	2.04	Flash access time = 6
MCB4300	LPC4357	M4	204	102	Internal flash	404	1.98	Flash access time = 9
MCB4300	LPC4357	M4	204	102	qSPI, S25FL032P	232	1.14	Specified for only 80MHz in quad SPI mode, but it works
MCB4300	LPC4357	M4	120	120	32-bit SDRAM, MT48C4M32 -6	98	0.82
MCB4300	LPC4357	M4	204	102	32-bit SDRAM, MT48C4M32 -6	84	0.41

Initialization functions are executed in internal SRAM

RW data was for all tests was located in internal SRAM

All Coremark related RO code is located in the respective Code Execution Memory

That's the reality, anything else is theory. Performing 70...90 Mio instructions per second might be valid for the following use case: your executable code size is overall 8 x 16 bits, therefore it fits into the buffers of the SDRAM interface. You load that one time and then execute it forever without fetching any new code from the external memory.

Not very realistic, right? Getting something from the SDRAM takes time, you need to address the SDRAM memory, you fetch the code and you maybe fetch data at a different SDRAM memory position. This means that your little 128-bit buffer cache gets invalid immediately. Then there are breaks in the memory bus activities, caused by the AHB bus behavior etc etc.

All this is causing a slow down, in average you are ending up at performance levels shown in the table above. It is indeed possible to write code which performs worse on the SPIFI, same code will execute maybe better on the SDRAM. But the coremark code seems to be a good average.

Using 32-bit SDRAM versus 16-bit is not a quantum leap, as the interface is always fetching 128-bits. Either 4 x 32 or 8 x 16. So you only win a little bit because of less break with the 4 x 32 fetching.

Taking the best of the different worlds is the best solution:

execute from internal flash and SRAM
use the SPIFI also for linear code execution
use SDRAM execution for the remaining part of the code
try to avoid the placement of variables in SDRAM, in case you are also executing from SDRAM. In average this will cause worse performance as in above table.
buffers and also const data can be in SDRAM, they are normally addressed in a linear way, so the standard 128-bit fetching of the SDRAM interface is an advantage there

Regards,

NXP Support Team

[LPC1853] How boot from ext. memory (SPIFI and SRAM)

[LPC1853] How boot from ext. memory (SPIFI and SRAM)

lpc18xx

LPC43xx

as much code as possible in internal flash, rest of the code image in qSPI

start from internal flash, relocate code from qSPI to external SDRAM

execute from internal flash and from external SDRAM, fetch display data also from qSPI