Speed comparison between on-chip SRAM and FlashRAM

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Speed comparison between on-chip SRAM and FlashRAM

1,428 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by JohnR on Tue Dec 11 17:59:43 MST 2012
Hi,

I am going again to expose my ignorance.

My Diolan board has a LPC4300 and an S29AL016J70TF1010 flash RAM. In the current system M4 is running a pair of SPI-based ADCs and a SPI-based DAC using the SGPIO module.

Initially the code was run from on-chip SRAM and the speed was excellent. The 14-bit ADCs and 16-bit DAC could be serviced (the data read and a new value sent to the DAC) within about 1.5 microseconds.

Running exactly the same code from flash at 0x1c000000 took 20 microseconds.

Looking at the read specs for the flash, 55/70 nanoseconds, I suppose this is reasonable.

But I had thought for some reason that code in flash was automatically (automagically?) copied at startup to SRAM but I guess that this is not the case.

Is there any way getting this to run faster? Is there a way of moving code from flash to SRAM?

JohnR.


Labels (1)
0 Kudos
9 Replies

1,267 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by nxp21346 on Wed Dec 26 11:04:00 MST 2012
The "flash accelerator" feature works to speed up the on-chip flash (which is 256 bits wide) on the LPC1800/LPC4300 parts with on-chip flash, but it does not affect external parallel flash performance. There is a feature in the bootloader to automatically copy the code to SRAM. To enable this feature, you need to add a header to the code which can be done with the Image Manager tool. http://www.lpcware.com/content/nxpfile/lpc18xx-43xx-image-manager

-Dave @ NXP
0 Kudos

1,267 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by JohnR on Wed Dec 19 06:22:27 MST 2012
Hi Phil,

Thanks again for your helpful comments.

I have got my ADC/DAC code working in M0 using on-chip SRAM and sending the ADC results to M4 for processing and display.

There is still a lot of M4 processing code yet to be written so your suggestions re placement are appreciated, as I suspect that the bottleneck will now be in M4.

John.





0 Kudos

1,267 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by PhilYoung on Tue Dec 18 09:30:05 MST 2012
The LPC4300's have a lot of flexibility in the external memory controller, but you need to configure it yourself.
By default at power up the EMC will be operating from a slow clock, if you want to speed it up then you need to program the EMC to enable clocking from a faster clock, and to configure the timings for your specific flash type.

Of course external flash will always be slower than running from SRAM, there is no CACHE on the LPC devices, but you can improve things slightly if your flash supports burst mode access.

If you want to run from internal SRAM then it's up to you to define this when you build your code, which depends on what toolchain you are using.
If you are using the ARM MDK then you simple need to use a scatter file that specifies a different load and execute region and the toolchain will include the necessary code relocation at boot time. With other toochains you will do it in a different manner and MAY need to provide your own starup code for this, I don't use GCC so I'm not sure what support that provides, maybe others can help.

for Keil / ARM it's a simple matter, here is a simple example I use that relocates the code to SRAM, but leaves the startup code in flash as there is no point in wasting SRAM. Most code required at runtime ends up in ER_IRAM1 at 0x10000200, but all the init code that only runs once stays in flash.

LR_IROM1 0x00000000 0x00040000  {    ; load region size_region
   ER_FLASH 0x00000000  { ; root section for startup code and vector table
      Startup_lpc43xx.o (RESET, +First)
      Startup_lpc43xx.o (FLASH_ENTRY)
      Startup_lpc43xx.o (SETREG)
      *.o(i.InitMemc)
      Startup_lpc43xx.o (+RO)
      *(InRoot$$Sections)  
      System_LPC43xx.o(+RO)
      fpu_init.o(+RO)
   }
   ER_BootExportCode 0x10000000 0x180 {
      *.o(BootExportCode)
   }
  
   ER_IRAM 0x10000200 {
      .ANY(+RO)
   }

   RW_AHBRAM1 0x20000200 0x0000C000  {  ; RW data, 48K
      *.o(+RW +ZI)
      .ANY (+RW +ZI)
   }
 
   DRAM 0x28100000 0x01000000 {
      Startup_lpc43xx.o (NETHEAP)
   }
   ER_BootExportData 0x20000000 UNINIT {
      *.o(BootExportData +First )
   }
}

regards

Phil.
0 Kudos

1,267 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by JohnR on Thu Dec 13 07:06:39 MST 2012
Thank you both for your very helpful replies. I will try the .fastcode option in the next few days and post the results then.

Also I will order one of tne new Diolan 4357 boards.

John.
0 Kudos

1,267 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by starblue on Thu Dec 13 04:19:35 MST 2012
> add a separate "ram text" section to your linker and startup code

That's the correct way to do it.  The name of the section varies
(.fastcode for GNU tools, .ramfuncs for TI, I haven't seen it for Keil so you might need to roll your own).

Except for execution speed you may need it for routines that program or configure flash memory.
0 Kudos

1,267 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by mdittrich on Wed Dec 12 15:13:32 MST 2012

The flash based 4300's still support external flash while being able to run from internal.  If you need that much storage, SPIFI is pretty convenient for data... I witnessed about a 4-6X slowdown running code from SPIFI, not the ~13x you saw with the 16b parallel.

I evaluated a 4350 on a dev board, a 4337 is going in my product because of the performance improvement.

MD
0 Kudos

1,267 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by mdittrich on Wed Dec 12 15:06:13 MST 2012
I'm no expert.

The "data" section (contains initialized data, stuff that is not zero) in flash is what is automatically copied to SRAM at start up, this happens in the reset ISR (usually), there is no magic to it. :) You should be able to find the reset ISR source and see what it is doing (it also will clear out the "bss" (data initialized to zero) section).

One hack you can use with GCC at least is to put a function into the data section, typically with "__attribute__((section(".data")))" in the function declaration.  Then the linker sticks all the code from that function (not all the code/libraries that function necessarily calls!) into the data section, and the startup code loads it like any other initialized data.  Note you might have to enable "long_calls" (or whatever your compiler's equivalent is) to be able to jump beyond the normal 64MB addressing branch range from off chip mapping to onchip (and back).

If you get this working, you can keep fast code on chip, and leave the slow stuff in off chip flash.  If you have to do alot of this, it makes sense to add a separate "ram text" section to your linker and startup code.

But, that flash is only a 16bit part, a 32bit bus would be obviously better.  Even though it is a 200MHz part, the M4 in the 4300 is still a micro controller, not a micro processor, it has no instruction cache.  The "flash accelerators" cheat by using a 128 bit wide flash interface and a cache at the flash level.  There is no way an uncached 16bit external interface could compete with that.  I would switch to a 4357 if you can.

MD
0 Kudos

1,267 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by JohnR on Wed Dec 12 14:55:32 MST 2012
Hi,

I too hope that the experts step in.

The data transfer itself is fine at the present since I am simply accumulating data locally - the problem is the speed of execution when running the SGPIO code from external flash.

In the new LPC43xx devices with on-chip flash, there is indeed a flash accelerator but for the processors needing external flash, one is out of luck it seems.

The previous version of my M0/M4 project the ADC (single ADC at that time) handler runs in M0 from code in SRAM that is downloaded from flash at startup. Which is why I did not until now see the flash slowdown. The M4 code only displayed a subset of the data from M0 and appeared adequately fast in this application.

I am now in the process of moving the new ADCs/DAC code presently in M4 to M0 and will see what happens. The problem will still remain with the M4 code that eventually has to do a lot of data massaging and communication.

John.





0 Kudos

1,267 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by mark03 on Wed Dec 12 10:44:02 MST 2012
I'm also curious to hear what the experts have to say.  In the meantime...

Would it be an option to use DMA and move data in the background?

In the ST Micro STM32F4 data, their "flash accelerator" feature is prominently advertised.  I guess it is just a cache of some kind?  And I wonder if the LPC43xx have something similar.  Does anyone know?
0 Kudos