We are using the LS1046ARDB. We used ARM's PMU CCNT (performance monitoring unit Clock Cycle counter) to time some routines in our baremetal boot code. We found the number of clock cycles consumed in performing basic tasks such as a variable increment in C (16 cycles) or a 32 byte Memory copy using standard memcpy implementation from linux kernel (7500 cycles) consumes way too many clock cycles and leads to high timing measurements.
Are there any possible reasons why so many core clock cycles are seemingly being wasted?
Also, we imported RCW's that are part of QorIQ SDK into QCVS PBI tool but we are unable to understand what fields determine the core clock speed. From the LS1046A reference manual we felt C1_PLL_SEL is the core clock however when we import the standard 1600 MHz RCW in QCVS it shows C1_PLL_SEL as 1.067 GHz. Please do clarify on how the core clock is calculated from C1_PLL_SEL
CGA_PLL1_RAT and C1_PLL_SEL define the core clock. See Table 4-14. RCW Field Descriptions.
For 1 GHz frequency and higher:
The value of CGA_PLL1_RAT field should be the required multiplication ratio and C1_PLL_SEL should be set to 4’b0000 (CGA_PLL1).
For example in order to achieve 1000 MHz core clock frequency with reference clock frequency of 100 MHz, the ratio should be 10(0xA) for locking the CGA PLL1 at 1000 MHz and C1_PLL_SEL=4’b0000 to achieve 1000 MHz core clock frequency.
For less than 1 GHz operation:
The value of CGA_PLL1_RAT field should be twice the required core clock frequency and C1_PLL_SEL should be set to 4’b001 (CGA_PLL1/2).
For example in order to achieve 800 MHz core clock frequency with reference clock frequency of 100 MHz, the ratio should be 16(0x10) for locking the CGA PLL1 at 1600 MHz and C1_PLL_SEL=4’b0001 to achieve 800 MHz core clock frequency.
NOTE: Not all ratios are supported due to frequency restrictions. Refer to the chip data sheet for the supported frequencies.
Thanks for your response. We have come across Table 4-14 before.
For example when we import the rcw_1600_qspiboot.bin from QorIQ SDK into the QCVS RCW Tool, here are the values we get for CGA_PLL1_RAT and C1_PLL_SEL
We expect a core clock frequency of 1600 MHz since the name of the file is rcw_1600_qspiboot (elsewhere it is mentioned that the number in the filename corresponds to core clock freq), however the C1_PLL_CLOCK shows a value of 1.067 GHz. Is this an issue with the RCW or with the QCVS tool itself?
The PLL settings in the rcw_1600_qspiboot.bin assume the System clock frequency 100MHz. In this case CGA_PLL1_RAT = 0b010000 provides exactly 1600MHz core frequency. The 1067MHz (and 933MHz) corresponds to the 66.6MHz System clock frequency. So you should change the System clock frequency from 66,66MHz to 100MHz to see correct values
Thanks I understood the RCW configuration now.
Leading back to my first question however, do you have any idea on why basic operations are consuming so many clock cycles?