Cortex-A9 ARM Errata 845369

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Cortex-A9 ARM Errata 845369

6,530 Views
MOW
Contributor IV

Apparently on 1st of April 2015 a work-around for ARM Cortex-A9 Errata #845369 has been added to public U-Boot repositories and this patch also is included in Freescale's most recent i.MX6x Android Release L5.0.0_1.0.0-ga, but nowhere else, so far (esp. not yet in the i.MX6x Yocto releases from Freescale).

The ARM Errata mentions "Setting this bit could possibly result in a visible drop in performance for routines that perform intensive memory

accesses, such as memset() or memcpy(). However, the workaround is not expected to create any significant performance degradation in most standard applications."

Brief tests on our i.MX6x-devices show that there is indeed a "visible drop in performance" for memset() and memcpy():

  • memset() performance drops down to <25% of its previous performance (yes, "down to" not "by")
  • memcpy() performance dropds down to ~50% of its previous performance

Has anybody made any more "real-world" experience with the performance-impact of the work-around for this errata #845369, so far (e.g. impact on graphics or video-performance, network performance, storage-media, etc)?

7 Replies

3,187 Views
norishinozaki
Contributor V

Thanks Marc,

BR, N.S

0 Kudos

3,186 Views
norishinozaki
Contributor V

Hello,

We got the x2-x3 degradation in our application when we applied this errata.

Does it end up with the same errata if we use NEON for memcpy?

BR.

NS.

0 Kudos

3,187 Views
MOW
Contributor IV

As I understand the description of the errata, the work-around for the errata changes the behaviour of the L1 data cache, so it should affect all kind of code that works on data that can be cached in the L1-cache; no matter what kind of instructions are used.

0 Kudos

3,187 Views
Yuri
NXP Employee
NXP Employee

Hello,

 

Appears, we do not have benchmark estimations regarding the issue impact.

As mentioned in the erratum,  the performance problem takes place for applications

that perform very intensive memory accesses. Customers can try to implement own

memcpy() function, and vary its parameters,  such as burst length, start address

aligning.

 

  The following may be helpful :

 

https://community.freescale.com/docs/DOC-106467

  

Have a great day,

Yuri

-----------------------------------------------------------------------------------------------------------------------

Note: If this post answers your question, please click the Correct Answer button. Thank you!

-----------------------------------------------------------------------------------------------------------------------

0 Kudos

3,186 Views
MOW
Contributor IV

Hi Yuri

Thanks for your reply. If I understand the description of the errata correctly, the work-around prevents the Cortex-A9 core from entering "read-allocate/streaming"-mode, i.e. the CPU-core will perform write-allocations in L1-data-cache for all write accesses, even if unnecessary e.g. for memset()/memcpy(), and therefore cause lots of unnecessary bus-traffic and cache-trashing for code that writes lots of full cache-lines.

This will not only impact memset()/memcpy() but also quite a few drivers working with internal buffers, software audio- and video-codecs and renderers writing larger amounts of data to memory, probably some garbage-collectors of JVM and other similar code in OSses, libraries and applications.

For the Cortex-A5 and A7 cores ARM documents when exactly the core switches to "read allocate mode" (which would be prevented by the work-around):

<<<

To prevent this, the Bus Interface Unit (BIU) includes logic to detect when a full cache line has been written by the processor before the linefill has completed. If this situation is detected on three consecutive linefills, it switches into read allocate mode.

>>>

Unfortunately neither ARM nor Freescale seem to document, when the Cortex-A9 core usually would do this. Do you have any information whether the A9 uses the same detection mechanism as the A5 and A7?

Kind regards,

Marc

3,186 Views
ambika
Contributor II

Hi Marc,

We would like to know

1. Whether the performance issue is observed in Android Lollipop 5 release?

2. Whether Freescale is able to reduce the performance impact into an acceptable level.

3. Whether the same patch can be applied on Android KK 4.4.3? What will be the impact on performance level?

Best Regards,

Rabiammal

0 Kudos

3,186 Views
MOW
Contributor IV

Hi Rabiammal

Sorry for answering so late, but I have been off for some time. The patch for the errata is very simply: it's nothing more than setting a single bit in a special configuration register and fully documented with the necessary code in the referenced official ARM Errata. Therefore it can by easily applied to any boot-loader or even operating system.

The performance impact depends on OS, driver and application code: any code that makes use of the kind of memory accesses mentioned above -- directly as well as indirectly -- will notice the performance impact. How much, depends on the specific application and your mileage will vary, depending on what you're doing.

As this is a bug in the Cortex-A9 itself, there is not much Freescale/NXP can do about it. So it's more a choice of "fast but maybe occasionally instable", "stable but, depending on code, slow", or "lots of manual tuning by each application developer to prevent the slow accesses". Upside is: it's not only Freescale's/NXP's Cortex-A9 SoCs, but all other manufacturers, as well.

Regards,

Marc

0 Kudos