How to increase speed EIM consecutive acces on iMx6S

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

How to increase speed EIM consecutive acces on iMx6S

Jump to solution
2,086 Views
Poussemousse
Contributor III

Hi,

We are using an iMX6S and we are interfacing a 16bits parallel 45ns asynchronous SRAM.

I mention that we can't use it with Burst access, neither Page access.

EIM bus registers :

EIM_CS1GCR1 00010001

EIM_CS1GCR2 00000000

EIM_CS1RCR1 03000000

EIM_CS1RCR2 00000008

EIM_CS1WCR1 03000000

EIM_CS1WCR2 00000000

Our trouble is that we have not yet been able to perform same fast consecutive accesses as described in Reference Manual

22.8.3 Asynchronous Read/Write Memory Accesses Timing Diagram

pastedImage_0.png

22.8.5 Consecutive Asynchronous Write Memory Accesses Timing Diagram

pastedImage_5.png

In our application, between each CS activation, we have to wait a rather long time.

The only way to have consecutive accesses is to widen the bus, but dead time is always there if we are performing more than one Read ou Write.


I explain :

With a 16 bits configuration : reading or Writing 2 different words of 16 bits, imply to wait for about 250/280ns between the 2 accesses.

In the following pictures, we observe CS signal.

With a 32 bits configuration : reading 2 different words of 16 bits, through 2 consecutive accesses doesn't any dead time (90ns time  = 2 very consecutive access of our 45ns SRAM)  , but than we have to wait again for about 270ns between the 2 words of 32 bits.

S4GMENU_Uboot_lecture_SRAM.png

With a 64 bits configuration: writing 1 words of 64bits = 4x 16bits word, through 4 consecutive accesses doesn't any dead time (180ns = 4x45ns) , but then we have to wait again for about 332ns for next 64 bits word Write

S4GMENU_Uboot_ecriture_SRAM.png

We are wondering if it is due to internal bus Exchange into the iMx6 which could explain this pain :

Data it going through AHB bus (Memory System Bus) to SPB bus (which is related to EIM Bus) callin AIPSTZ bridge.

Therefore, we have : 45ns latency because of the SRAM itself, which becomes 48ns because of clock synchronisation with EIM clock, plus at best 82ns (no idea where it comes from, maybe time between SPB to EIM) + 3 hclk clock (150ns) for each AIPSTZ bridge walkthrough.

Total : 48 + 82 + 150 = 280ns (we have not figured why we see 332ns).

We know that we can reduce access time staying on AMBA bus to go from AHB to APB and not going through AIPSTZ, by using SDMA or L2 cache (thanks to the MMU).

This solution rises 3 problems :

  • 1) It is tricky to set it up
  • 2) We always keep a dead time with the first walkthrough AHB to APB bus, and for each next access.
  • 3) We have no idea if transfer is started/finished

Any help would be very gratefully appreciated, we are lacking of new ideas.

Poussemousse

Labels (1)
1 Solution
1,428 Views
Poussemousse
Contributor III

Hi everybody,

This trouble has been fixed on the latest silicon revision, even if NXP/FREESCALE don't mention it anywhere !

So you can now use with confidence REV 1.3, and almost reach maximum data rate.

Discussion is closed.

View solution in original post

4 Replies
1,429 Views
Poussemousse
Contributor III

Hi everybody,

This trouble has been fixed on the latest silicon revision, even if NXP/FREESCALE don't mention it anywhere !

So you can now use with confidence REV 1.3, and almost reach maximum data rate.

Discussion is closed.

1,428 Views
Yuri
NXP Employee
NXP Employee

  Please take into account the following aspects of the performance issue.

The ARM architecture provides block transfer instructions LDM / STM. Perhaps
it makes sense to apply NEON VLD / VST instructions for burst transfers.
  To get maximum (internal bus) performance all caches (Instruction and Data, L2)
should be enabled. Also the MMU should be configured and enabled (to get the
Data cache working).
   The most quick accesses are expected between so-called back-to-back ones,
when there are no gaps between transfers. For LDM / STM instructions this
means only two 32-bit accesses will be performed in such back-to-back manner.
Then additional pauses for arbitrations, bus turn-arounds may be added.
  For EIM accesses, due to internal bus limitation, bus turn around time may be
about 150ns, because of latency to go through the couple of PL301 cross bars and
the AIPS peripheral bridge. 

Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

0 Kudos
Reply
1,428 Views
Poussemousse
Contributor III

Yuri Muhin a écrit:

  Please take into account the following aspects of the performance issue.

The ARM architecture provides block transfer instructions LDM / STM. Perhaps
it makes sense to apply NEON VLD / VST instructions for burst transfers.
  To get maximum (internal bus) performance all caches (Instruction and Data, L2)
should be enabled. Also the MMU should be configured and enabled (to get the
Data cache working).
   The most quick accesses are expected between so-called back-to-back ones,
when there are no gaps between transfers. For LDM / STM instructions this
means only two 32-bit accesses will be performed in such back-to-back manner.
Then additional pauses for arbitrations, bus turn-arounds may be added.
  For EIM accesses, due to internal bus limitation, bus turn around time may be
about 150ns, because of latency to go through the couple of PL301 cross bars and
the AIPS peripheral bridge. 

Have a great day,
Yuri

-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------

Yuri,

thank you for your quick answer.

Nevertheless, can you please explain me what is the full configuration of EIM bus to be able to have same behavior than in your figures 22-4 and 22-7 ?

If you have a look at them, you (Freescale) show that you can have multiple consecutive accesses without bus turn-arounds (at least they don't appear).

Would you be able to provide me such configuration files in order to perform the same on our target ?

With all my best regards,

Poussemousse

0 Kudos
Reply
1,428 Views
Yuri
NXP Employee
NXP Employee

  Main parameters settings (RSCA = 1, RADVA = 2, … ) are shown on figures 22-4

and 22-5. Also, note, the figures serves mainly for demonstration purposes.

To get maximum performance, please use ARM instructions LDM / STM.

Regards,

yuri.

0 Kudos
Reply