Hi,
I have problem, which I'm trying to solve.
Background:
+ I use Keil uVision with ARM-GCC;
+ I use SPIFI lib to get data from ext. flash (QSPI);
+ I have HW setup :
+ Ext. Flash - icons, sounds, texts; Ext. RAM - frame buffers for TFT;
Problem:
+ I have internal flash and ram in my uC, but internal flash (only for code) run out.
+ I know that LPC18xx has boot ROM feature, but in my scenerio I can't use it. Guy from NXP told me that, code execution from ext flash (via qspi) is to slow to our application. It means, that I should execute my code from external RAM. This code has to come from external flash (via QSPI).
My proposal:
+ I need a support for doing this scenario:
1) program code and data for GUI (icons, sounds) in ext. flash memory + small piece of code to internal flash;
2) at startup of uC - init QSPI and EMC controller and copy all code from ext. flash to ext. RAM;
3) start executing from ext. RAM (with some critical things in internal RAM);
+ We should remember that all the time :
Ext. Flash is for icons, sounds, texts;
Ext. RAM is for frame buffers for TFT;
My questions:
+ How do it properly?
I think that I should manually setup linker to do this kinds of things:
* reserve memory area to execution code in ext. RAM (it is visible in our memory region at 0x2000 0000 address) - only reserve, because at startup there is no code inside;
* setup linker to write initial code ("small piece of code") to internal flash (e.g. to bank A);
* This initial code should init flash an RAM controllers, and copy code from ext. flash to ext. RAM;
* Start execution from ext. RAM;
+ Do you have any other idea, about what I should do to get my My proposal?
Generally I recommend to work with the flash version LPC1857, it is a later silicon version than the flash-less types LPC1850, so quite some bugs in the initial LPC1800 architecture have been fixed there.
However, in principle the LPC1850 would do the job as well:
If you use the LPC1857 I would recommend to partition it this way:
This of course requires an accurate "floor planning", also with regards to the stack/heap pointer.
There is one interesting thing to consider: the SPIFI has a small cache. This means that code snippets like smaller for-loops are executed from this cache. If you execute from external RAM, then there is no cache involved. The SDRAM loads always 128 bits, so there are at least 8 x 16bits availabel for the ARM, but that's less than the SPIFI cache.
So executing from SDRAM and managing the buffers on the SDRAM means to share the bus between these two tasks. Sometimes it's faster to execute from SPIFI and manage the buffers on the SDRAM. Hard to say where you could end up, but if you have these memories physically on your board anyway, you could chnage the configuration on the fly and test it out.
Regards,
NXP Support Team
Two points:
1) I have LPC1853 with flash (512kB int flash);
2) I can not switch to LPC1857 (1MB int flash);
If you use the LPC1857 I would recommend to partition it this way:
as much code as possible in internal flash, rest of the code image in qSPI
start from internal flash, relocate code from qSPI to external SDRAM
execute from internal flash and from external SDRAM, fetch display data also from qSPI
It is similar to my My proposal.
The question is not about logic algorithm, but how to implement this scenario.
I have mail from NXP :
When you want to execute your program out of SPIFI Flash what we need to consider is the performance that you need.
When executing program out of SPIFI Flash the CPU is able to execute about 5..8Mio Instructions per second, while running the same code out of SDRAM the CPU may execute about 70..90Mio instructions per second (@180MHz).
How would you comment on this Bernhard?
Regards:
Mariusz
Hi Mariusz,
for the LPC1853 of course the same rule applies: put as much as possible into the internal flash.
Here is a measurement I did quite some time ago using the Coremark benchmark code. It applies to LPC1853 as well, because the memory subsystems are exactly the same. Max frequency is of course 180MHz for the LPC1800.
Board | MCU | Core | CPU Clock | EMC Clock | Code Execution Memory | Coremark | Coremark | Note |
MCB4300 | LPC4357 | M4 | 204 | 102 | Internal SRAM | 438 | 2.15 |
|
MCB4300 | LPC4357 | M4 | 204 | 102 | Internal flash | 415 | 2.04 | Flash access time = 6 |
MCB4300 | LPC4357 | M4 | 204 | 102 | Internal flash | 404 | 1.98 | Flash access time = 9 |
MCB4300 | LPC4357 | M4 | 204 | 102 | qSPI, S25FL032P | 232 | 1.14 | Specified for only 80MHz in quad SPI mode, but it works |
MCB4300 | LPC4357 | M4 | 120 | 120 | 32-bit SDRAM, MT48C4M32 -6 | 98 | 0.82 |
|
MCB4300 | LPC4357 | M4 | 204 | 102 | 32-bit SDRAM, MT48C4M32 -6 | 84 | 0.41 |
|
Initialization functions are executed in internal SRAM
RW data was for all tests was located in internal SRAM
All Coremark related RO code is located in the respective Code Execution Memory
That's the reality, anything else is theory. Performing 70...90 Mio instructions per second might be valid for the following use case: your executable code size is overall 8 x 16 bits, therefore it fits into the buffers of the SDRAM interface. You load that one time and then execute it forever without fetching any new code from the external memory.
Not very realistic, right? Getting something from the SDRAM takes time, you need to address the SDRAM memory, you fetch the code and you maybe fetch data at a different SDRAM memory position. This means that your little 128-bit buffer cache gets invalid immediately. Then there are breaks in the memory bus activities, caused by the AHB bus behavior etc etc.
All this is causing a slow down, in average you are ending up at performance levels shown in the table above. It is indeed possible to write code which performs worse on the SPIFI, same code will execute maybe better on the SDRAM. But the coremark code seems to be a good average.
Using 32-bit SDRAM versus 16-bit is not a quantum leap, as the interface is always fetching 128-bits. Either 4 x 32 or 8 x 16. So you only win a little bit because of less break with the 4 x 32 fetching.
Taking the best of the different worlds is the best solution:
Regards,
NXP Support Team