As for timing / cycle estimations for different ARM Cortex –A9 NEON
instructions we should apply to section 3.1 (About instruction cycle timing)
of “Cortex-A9 NEON Media Processing Engine Technical Reference Manual” :
“The complexity of the Cortex-A9 processor makes it impossible to guarantee precise
timing information with hand calculations. The timing of an instruction is often affected
by other concurrent instructions, memory system activity, and events outside the instruction
flow.”
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0409i/Babbgjhi.html
Next, according to Table 3.3 (VFP load and store instruction timing), generally, timings for
read / write operations for VLDM / VSTM are not different. But, note, data for 128-bit NEON
operations are loaded / stored from / to the caches. Therefore – first – the caches should be
enabled, and – the second – cache policy may affect performance of cache related operations.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388i/Babeaijd.html
As for system bus arbitration, according to Chapter 45 [Network Interconnect Bus System
(NIC-301)] of the i.MX6 DQ RM, “The NIC-301 default settings are configured by Freescale's
board support package (BSP), and in most cases should not be modified by the customer.
The default settings have gone through exhaustive testing during the validation of the part,
and have proven to work well for the part's intended target applications. Changes to the default
settings may result in a degradation in system performance.”
Have a great day,
Yuri
-----------------------------------------------------------------------------------------------------------------------
Note: If this post answers your question, please click the Correct Answer button. Thank you!
-----------------------------------------------------------------------------------------------------------------------