imx233 How to correctly initialise the CPU clock?

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

imx233 How to correctly initialise the CPU clock?

Jump to solution
2,436 Views
chrissmith
Contributor III

Here's hoping you can help!

I've a project that I was running under Linux on an (olimex) imx233 board that I need to run bare, to reduce the resource use.

When I run it bare (having done all the initialisation I thought necessary), it appears much slower, as though the CPU is not running at the desired 454 MHz.

To prove this, I wrote the following test program to initialise the imx23 CPU clock (clk_p) to 454 MHz, set clk_h to clk_p / 3, and toggle a GPIO pin at clk_h / 8. My last round of testing seemed to indicate that clk_p was actually at 42 MHz, but I confess I've gone round and round in my testing, and have kind of lost the plot now, so 42 could be wrong!

I'm currently doing this without JTAG until my serial JTAG arrives.

This code is not a million miles away from the u-boot code. This should compile with gcc and the imx-bootlets mach-mx23/includes/ files, and boot with imxbootlets.

Am I right in thinking that clk_p is the CPU clock?

Can someone confirm my findings and/or point out the error of my ways????

Yours, really stuck,

Chris

#include "regsdigctl.h"

#include "regsclkctrl.h"

#include "regspower.h"

#include "regstvenc.h"

#include "regspinctrl.h"

#include "regs.h"

#define LED_OLIMEX (1 << 1)

/* -------------------------------------------------------------------------- */

void delay_us (int us)

{

    olatile int start = HW_DIGCTL_MICROSECONDS_RD();

    while (HW_DIGCTL_MICROSECONDS_RD() - start < us) { }

}

/* -------------------------------------------------------------------------- */

int main(void)

{

    unsigned int value;

    /* Configue PIO for LED */

    BW_PINCTRL_MUXSEL4_BANK2_PIN01( 0x3 ); // Enable GPIO

    HW_PINCTRL_DOUT2_SET( LED_OLIMEX );

    HW_PINCTRL_DOE2_SET ( LED_OLIMEX );

    /* Enable PLL. */

    HW_CLKCTRL_PLLCTRL0_SET( BM_CLKCTRL_PLLCTRL0_POWER );

    delay_us(10000);

    // Change VDD to 1.550v

    value  =  HW_POWER_VDDDCTRL_RD();

    value &= ~BM_POWER_VDDDCTRL_TRG;

    value |=  BF_POWER_VDDDCTRL_TRG(30);

    HW_POWER_VDDDCTRL_WR(value);

    delay_us(10000);

   

    HW_CLKCTRL_CLKSEQ_SET( BM_CLKCTRL_CLKSEQ_BYPASS_CPU ); // Use ref_xtal

    HW_CLKCTRL_FRAC_SET( BM_CLKCTRL_FRAC_CLKGATECPU   );   // Disable ref_cpu

    HW_CLKCTRL_FRAC_SET( BM_CLKCTRL_FRAC_CPUFRAC      );   // Leave CPUFRAC set to 19

    HW_CLKCTRL_FRAC_CLR( BF_CLKCTRL_FRAC_CPUFRAC(~19) );   // by going to max, then to 19

    HW_CLKCTRL_FRAC_CLR( BM_CLKCTRL_FRAC_CLKGATECPU   );   // Enable ref_cpu @  454.736 MHz

    // Set CLK_H to (CPU) CLK_P / 3

    HW_CLKCTRL_HBUS_SET(BM_CLKCTRL_HBUS_DIV);

    HW_CLKCTRL_HBUS_CLR( ((~3)&BM_CLKCTRL_HBUS_DIV) );

    //HW_CLKCTRL_HBUS_WR( BF_CLKCTRL_HBUS_DIV(3) );

    while( HW_CLKCTRL_HBUS_RD() & BM_CLKCTRL_HBUS_BUSY )  {}

    delay_us(10000);

    HW_CLKCTRL_CLKSEQ_CLR( BM_CLKCTRL_CLKSEQ_BYPASS_CPU ); // Use ref_cpu

    // Flash LED @ clk_h / 8

    // So rate should be clk_p / (3 * 8) = 454.736 / 24 = 18.94 MHz

    //

    while(1) {

        if( HW_DIGCTL_HCLKCOUNT_RD() & 0x4  ) { // divide by 8

            HW_PINCTRL_DOUT2_SET( LED_OLIMEX );

        } else {

            HW_PINCTRL_DOUT2_CLR( LED_OLIMEX );

        }

    }

}

/* -------------------------------------------------------------------------- */

Labels (1)
0 Kudos
1 Solution
1,743 Views
chrissmith
Contributor III

I've dumped out the CLKCTRL registers, even dumped out POWER and EMI registers to see what was going on.

There is no real difference between the Linux settings and mine.

I used SJTAG to dump the values out of my application, and noticed that I-Cache and D-Cache were disabled!

I was not configuring the MMU correctly (actually I was bypassing it). Today I corrected that, and now the code runs at the speed I'd expect.

Thank you for your assistance, you were most helpful!

View solution in original post

0 Kudos
10 Replies
1,743 Views
AnsonHuang
NXP Employee
NXP Employee

Hi, Chris

     I think you can try below:

     1. Using HW_DIGCTL_MICROSECONDS which is a microsecond counter in stead of HW_DIGCTL_HCLKCOUNT which is HCLK counter, as HCLK counter may be auto slow;

     2. Please check ASM_ENABLE bit in HW_CLKCTRL_HBUS to see whether auto slow mode is enabled. Not sure whether i.MX233 has same function as i.MX28.

0 Kudos
1,743 Views
chrissmith
Contributor III

I can confirm that HW_CLKCTRL_HBUS is completely set to zero, except for the clk_p-to-clk_h divisor value of 3, so auto_slow modes are off.

The fact that changing the clk_p-to-clk_h divisor alters the speed of the executing code is telling me something. I just don't know what yet.

Can anyone say why this would happen?  I am sure it will reveal my problem.

0 Kudos
1,743 Views
AnsonHuang
NXP Employee
NXP Employee

Hi, Chris

     As when ARM core executes instruction, it needs to access HBUS to get instruction/data, so the HBUS's freq will impact the instruction execution time.

     My suggestion is to do it as below, do NOT pull up/down GPIO, just doing simple instruction such as below, see how many HCLK counts ARM needs to finish 10000000 times adding, and we also need to see the asm code of  while (i++ < 10000000);, how many instruction it is compiled out. Then we can get ARM's freq roughly. ARM executes instruction each clock.

     start = HW_DIGCTL_HCLKCOUNT_RD()

     while (i++ < 10000000)

          ;

   end = HW_DIGCTL_HCLKCOUNT_RD()

0 Kudos
1,743 Views
chrissmith
Contributor III

Okay, done that test with the following code snippet (timeloop was hand written to be as simple as possible):

        ; start = HW_DIGCTL_MICROSECONDS_RD();

        ldr     r3, [pc, #1208] ;  <0x8001c0c0>

        ldr     r3, [r3]

        str     r3, [fp, #-20]

       

        ldr     r3, [pc, #-120] ; <looplen = 1000000>

        mov     r1, #0

timeloop:

        sub     r3, r3, #1

        teq     r1, r3

        bne     timeloop

        ; end = HW_DIGCTL_MICROSECONDS_RD();

        ldr     r3, [pc, #1176] ;  <0x8001c0c0>

        ldr     r3, [r3]

        str     r3, [fp, #-24]

The elapsed time (including getting the end time) is 70371us for 1,000,000 iterations of the timeloop.

Now, please check my logic, but for me this gives 14,210,399.1701 loop iterations per second, and therefore a CPU frequency (assuming 3 clock cycles per loop) of 42.63119751 MHz.

This test was done with clk_h = clk_p / 2 and not 3 as in my original post. Repeating with clk_h = clk_p / 3, the loop takes 105556us.

Best regards

Chris

0 Kudos
1,743 Views
AnsonHuang
NXP Employee
NXP Employee

What you want is to prove that CPU is running @454MHz, right?

If so, then we can make things easier, just set H_CLK to be 1/3 CLK_P, and use H_CLK counter to compare with microsecond counter in the same DIG module, or even with RTC timer then we can get it.

I am afraid previous approach is not very accurate, sorry for that.

1,743 Views
chrissmith
Contributor III

I've tried your request with the following code:

volatile unsigned int start = HW_DIGCTL_HCLKCOUNT_RD();

volatile unsigned int count = 0;

delay_us(1000000); // 1 second delay

count = HW_DIGCTL_HCLKCOUNT_RD() - start;

This gives me a count of 151596528 which, if clk_h = clk_p / 3, gives clk_p = (approximately) 454.789584 MHz, suggesting clk_p is set correctly.

The question remains as to why my code runs as expected under Linux, but much slower on the bare chip?

For example, if I get my application running on Linux to flash an LED by writing to /sys/class/gpiox/gpio as it executes, it does so with a 0.04s period.

Running the same application directly on the chip, and flashing the LED by writing to HW_PINCTRL_DOUT2_xxx, does so with a period of 2.5s.

I can't explain why the performance would be so different, and I desperately need to get this to work.

An ideas?

Thanks

Chris

0 Kudos
1,743 Views
AnsonHuang
NXP Employee
NXP Employee

HI, Chris

       Sorry that I didn't have a i.mx233 board here to do some investigation for you, so what I can do is providing my thoughts, it does not make sense that bare chip running slow than Linux OS. So next step, I think we can dump all registes value in clkctrl module, then compare it on Linux and on bare chip, hope there is something wrong with the clk module config.

1,744 Views
chrissmith
Contributor III

I've dumped out the CLKCTRL registers, even dumped out POWER and EMI registers to see what was going on.

There is no real difference between the Linux settings and mine.

I used SJTAG to dump the values out of my application, and noticed that I-Cache and D-Cache were disabled!

I was not configuring the MMU correctly (actually I was bypassing it). Today I corrected that, and now the code runs at the speed I'd expect.

Thank you for your assistance, you were most helpful!

0 Kudos
1,743 Views
chrissmith
Contributor III

Proving 454MHz is my first test in investigating my execution speed problem.

My issue is that my project runs somewhere between 5 and 10 times faster when running on top of Linux, than when running on the bare chip - the opposite of what should happen.

Even a simple test like flashing the LED in a loop shows the same speed difference. The bare chip should be much faster!

Obviously I suspect my chip initialisation, and the first thing to look at is the CPU frequency. So far it looks slow.

I will try your latest suggestion, and report back!

Chris

--

Sent while on the move...

0 Kudos
1,743 Views
chrissmith
Contributor III

Thanks for the advice.

As far as I can tell by looking at the code, bit 20 (AUTO_SLOW_MODE) of HW_CLKCTRL_HBUS is 0 and thus off (the default). There are other auto-slow modes available, so I will specifically set them all to OFF and report back to you later.

For info, I was specifically using the HCLK counter to measure the frequency of clk_h, so I could indirectly measure clk_p ( clk_h = clk_p / 3 ). As DIGCTL_MICROSECONDS counter is driven directly from the 24MHz, I can't use that to measure clk_p.

I've made a further discovery since my original post. I wrote a loop in assembly that toggled the LED pin, instead of using the HCLK counter, so that the execution speed of the ARM could be measured directly. Based on an average of 1.5 CPI for an ARM9 core, there should be 9 clock cycles between each change of LED state. Again measurements do not match the theory and, confusingly, if I double the clk_p-to-clk_h divide ratio from 3 to 6, the LED flash period DOUBLES !!

I was not expecting this. Why should clk_h affect the execution speed of a tight loop of ARM assembly code???

Really confused now.

Chris

0 Kudos