Speed of execution

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Speed of execution

2,612 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by noddy14 on Thu Nov 07 08:56:51 MST 2013
Hello

I am considering using an lpc43 device for a new project. However, I have been unable to get information about the actual speed that could be achieved with these devices.

The key requirements for the new project is to read 3 x SPI A/D 18 bit devices simultaneously at 50Mbit/s. Could the SGPIO be set up to do this?

Also, I know the hardware FPU only supports single precision. Can anyone say how fast the software double precision is?

Finally, does any one know if there is a programming guide/manual for the lpc43xx chips?

Thanks


Labels (1)
0 Kudos
Reply
12 Replies

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by noddy14 on Mon Nov 11 08:20:40 MST 2013
Thank you Pacman and Starblue. Very helpful comments and suggestions.
0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by starblue on Mon Nov 11 06:39:22 MST 2013

Quote: nazoa
The data from the A/D comes in bursts, so the peak rate is very high but the average speed is a lot lower so I wondered if it could be done with an LPC43xx.



In that case maybe you could try to use SGPIO with DMA.
IIRC there is a thread where this is discussed for some radio application.
Somebody managed to do it, but it is rather difficult.

I can't use DMA, because I need to process data continuously and react with low latency.
0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by Pacman on Sat Nov 09 11:45:07 MST 2013
If starblue can handle 2 streams at 20MHz using 50% of the M0 core, I am convinced that the M4 core will be able to handle 3 streams at 50MHz.
Running from RAM is not difficult. I've done it in several ways:
Method 1: I made a linker-script, which placed the code in a section that is copied at reset.
Method 2: I copied an assembly-language routine using C.
Method 3: I copied an assembly-language routine using assembly-language.
Method 4: If using the M0, you can use the load_image(...) library function to copy code from the flash memory to SRAM and make the M0 core execute it.

If using the M4, most instructions are at 1 clock cycle (you can load/store multiple registers using the LDM/STM, those take 1 clk + 1 clk per register).
LDRx/STRx can be pipelined so they'll take one clk per instruction.

On the M0 core, things are different. LDM/STM still takes one clock cycle plus one clk per register, BUT you're restricted to incrementing the base register all the time.
LDRx/STRx always takes 2 clock cycles on the M0.
On the M0, you can basically only use r0-r7 (which is a bit too tight sometimes).
...And on the M0, you can't use fancy stuff like...

add r0,r4,r8,lsr#8

...You'd have to do that in several operations using 3 clock cycles instead of just 1.

Also, you should have in mind that the SGPIO data is least significant bit first, not most significant bit.
0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by noddy14 on Sat Nov 09 08:16:51 MST 2013
Thank you very much for the replies. I wanted flash because of simplicity but it may not be fast enough. The data from the A/D comes in bursts, so the peak rate is very high but the average speed is a lot lower so I wondered if it could be done with an LPC43xx.
0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by starblue on Fri Nov 08 12:00:22 MST 2013

Quote: nazoa
OK, thanks for your comments. Yes, it is the flash that I cannot find specified anywhere.



Look at the FLASHCFGx registers in the user manual.
For 200 MHz you need 9 wait states (so 10 cycles access time).
The factor 3 is for this setting.

It seems to be quite tolerant of overclocking, I inadvertently used 5 wait states for half a year without noticing any problems.
0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by rocketdawg on Fri Nov 08 09:26:39 MST 2013
some LPC43xx are flash less parts.  which part are you interested in?
Flash parts use a flash accelerator or cache to get a very fast access out of flash.
But the actual throughput depends on the nature of the code since the cache resources are finite.
you may want to look at UM10562.pdf for the LPC408x chapter 4.  should be very similar.

If one has timing sensitive routines, then one can always specify that certain functions should execute out of RAM.  (guaranteed fastest)
the only real bet is to look at the CMSIS for the particular part flash and read the code to see how it configures the cache and wait states.

3 ea 50Mbit SPI channels is moving some data.  perhaps embedded an ARM core in a FPGA.

0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by noddy14 on Fri Nov 08 08:32:09 MST 2013

OK, thanks for your comments. Yes, it is the flash that I cannot find specified anywhere.
0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by starblue on Fri Nov 08 08:20:58 MST 2013

Quote: nazoa
My concern is trying to understand the speed of execution, i.e. the number of clock cycles each instruction takes. Surely NXP  must know this!! It must be written down somewhere.



Since it is the core it is not NXP but ARM, look for DDI0439, the Cortex M4 Technical Reference Manual.


Quote: nazoa
Also, I want to know if the SGPIO can be configured to read 3 serial data streams simultaneously.



Yes.

I've done two in parallel from an AD7367-5 running at 20 MHz, receiving and processing four 14-bit values every 4us.
Managing the SGPIO and processing the values takes a little more than half the processing power of the Cortex M0 at 200MHz, running from internal RAM.

I would guess that with three streams at 50 MHz you processor will be quite stressed.

If you want the full performance you should put your program in internal RAM.
Internal flash is about three times slower.
0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by noddy14 on Fri Nov 08 08:09:22 MST 2013
Thanks, I had asked Larry and seen the ARM document. However, it doesn't tell me anything about running with the NXP flash implementation and how may (if any) wait states are required, etc.
0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by LabRat on Fri Nov 08 07:33:51 MST 2013

Quote: nazoa
My concern is trying to understand the speed of execution, i.e. the number of clock cycles each instruction takes. Surely NXP  must know this!! It must be written down somewhere.



Larry Page knows this also: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/CHDDIGAC.html

M4 has a DWT unit, so you can read out cycles there.
0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by noddy14 on Fri Nov 08 07:24:29 MST 2013
Well, the data will be processed and stored in memory for later transmission to the host.

My concern is trying to understand the speed of execution, i.e. the number of clock cycles each instruction takes. Surely NXP  must know this!! It must be written down somewhere.

Also, I want to know if the SGPIO can be configured to read 3 serial data streams simultaneously.

0 Kudos
Reply

2,583 Views
lpcware
NXP Employee
NXP Employee
Content originally posted in LPCWare by wmues on Fri Nov 08 06:35:30 MST 2013
What about the requirements to store the data from the A/D converters in memory or file system?
0 Kudos
Reply