- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here's hoping you can help!
I've a project that I was running under Linux on an (olimex) imx233 board that I need to run bare, to reduce the resource use.
When I run it bare (having done all the initialisation I thought necessary), it appears much slower, as though the CPU is not running at the desired 454 MHz.
To prove this, I wrote the following test program to initialise the imx23 CPU clock (clk_p) to 454 MHz, set clk_h to clk_p / 3, and toggle a GPIO pin at clk_h / 8. My last round of testing seemed to indicate that clk_p was actually at 42 MHz, but I confess I've gone round and round in my testing, and have kind of lost the plot now, so 42 could be wrong!
I'm currently doing this without JTAG until my serial JTAG arrives.
This code is not a million miles away from the u-boot code. This should compile with gcc and the imx-bootlets mach-mx23/includes/ files, and boot with imxbootlets.
Am I right in thinking that clk_p is the CPU clock?
Can someone confirm my findings and/or point out the error of my ways????
Yours, really stuck,
Chris
#include "regsdigctl.h"
#include "regsclkctrl.h"
#include "regspower.h"
#include "regstvenc.h"
#include "regspinctrl.h"
#include "regs.h"
#define LED_OLIMEX (1 << 1)
/* -------------------------------------------------------------------------- */
void delay_us (int us)
{
olatile int start = HW_DIGCTL_MICROSECONDS_RD();
while (HW_DIGCTL_MICROSECONDS_RD() - start < us) { }
}
/* -------------------------------------------------------------------------- */
int main(void)
{
unsigned int value;
/* Configue PIO for LED */
BW_PINCTRL_MUXSEL4_BANK2_PIN01( 0x3 ); // Enable GPIO
HW_PINCTRL_DOUT2_SET( LED_OLIMEX );
HW_PINCTRL_DOE2_SET ( LED_OLIMEX );
/* Enable PLL. */
HW_CLKCTRL_PLLCTRL0_SET( BM_CLKCTRL_PLLCTRL0_POWER );
delay_us(10000);
// Change VDD to 1.550v
value = HW_POWER_VDDDCTRL_RD();
value &= ~BM_POWER_VDDDCTRL_TRG;
value |= BF_POWER_VDDDCTRL_TRG(30);
HW_POWER_VDDDCTRL_WR(value);
delay_us(10000);
HW_CLKCTRL_CLKSEQ_SET( BM_CLKCTRL_CLKSEQ_BYPASS_CPU ); // Use ref_xtal
HW_CLKCTRL_FRAC_SET( BM_CLKCTRL_FRAC_CLKGATECPU ); // Disable ref_cpu
HW_CLKCTRL_FRAC_SET( BM_CLKCTRL_FRAC_CPUFRAC ); // Leave CPUFRAC set to 19
HW_CLKCTRL_FRAC_CLR( BF_CLKCTRL_FRAC_CPUFRAC(~19) ); // by going to max, then to 19
HW_CLKCTRL_FRAC_CLR( BM_CLKCTRL_FRAC_CLKGATECPU ); // Enable ref_cpu @ 454.736 MHz
// Set CLK_H to (CPU) CLK_P / 3
HW_CLKCTRL_HBUS_SET(BM_CLKCTRL_HBUS_DIV);
HW_CLKCTRL_HBUS_CLR( ((~3)&BM_CLKCTRL_HBUS_DIV) );
//HW_CLKCTRL_HBUS_WR( BF_CLKCTRL_HBUS_DIV(3) );
while( HW_CLKCTRL_HBUS_RD() & BM_CLKCTRL_HBUS_BUSY ) {}
delay_us(10000);
HW_CLKCTRL_CLKSEQ_CLR( BM_CLKCTRL_CLKSEQ_BYPASS_CPU ); // Use ref_cpu
// Flash LED @ clk_h / 8
// So rate should be clk_p / (3 * 8) = 454.736 / 24 = 18.94 MHz
//
while(1) {
if( HW_DIGCTL_HCLKCOUNT_RD() & 0x4 ) { // divide by 8
HW_PINCTRL_DOUT2_SET( LED_OLIMEX );
} else {
HW_PINCTRL_DOUT2_CLR( LED_OLIMEX );
}
}
}
/* -------------------------------------------------------------------------- */
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've dumped out the CLKCTRL registers, even dumped out POWER and EMI registers to see what was going on.
There is no real difference between the Linux settings and mine.
I used SJTAG to dump the values out of my application, and noticed that I-Cache and D-Cache were disabled!
I was not configuring the MMU correctly (actually I was bypassing it). Today I corrected that, and now the code runs at the speed I'd expect.
Thank you for your assistance, you were most helpful!


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Chris
I think you can try below:
1. Using HW_DIGCTL_MICROSECONDS which is a microsecond counter in stead of HW_DIGCTL_HCLKCOUNT which is HCLK counter, as HCLK counter may be auto slow;
2. Please check ASM_ENABLE bit in HW_CLKCTRL_HBUS to see whether auto slow mode is enabled. Not sure whether i.MX233 has same function as i.MX28.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can confirm that HW_CLKCTRL_HBUS is completely set to zero, except for the clk_p-to-clk_h divisor value of 3, so auto_slow modes are off.
The fact that changing the clk_p-to-clk_h divisor alters the speed of the executing code is telling me something. I just don't know what yet.
Can anyone say why this would happen? I am sure it will reveal my problem.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Chris
As when ARM core executes instruction, it needs to access HBUS to get instruction/data, so the HBUS's freq will impact the instruction execution time.
My suggestion is to do it as below, do NOT pull up/down GPIO, just doing simple instruction such as below, see how many HCLK counts ARM needs to finish 10000000 times adding, and we also need to see the asm code of while (i++ < 10000000);, how many instruction it is compiled out. Then we can get ARM's freq roughly. ARM executes instruction each clock.
start = HW_DIGCTL_HCLKCOUNT_RD()
while (i++ < 10000000)
;
end = HW_DIGCTL_HCLKCOUNT_RD()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay, done that test with the following code snippet (timeloop was hand written to be as simple as possible):
; start = HW_DIGCTL_MICROSECONDS_RD();
ldr r3, [pc, #1208] ; <0x8001c0c0>
ldr r3, [r3]
str r3, [fp, #-20]
ldr r3, [pc, #-120] ; <looplen = 1000000>
mov r1, #0
timeloop:
sub r3, r3, #1
teq r1, r3
bne timeloop
; end = HW_DIGCTL_MICROSECONDS_RD();
ldr r3, [pc, #1176] ; <0x8001c0c0>
ldr r3, [r3]
str r3, [fp, #-24]
The elapsed time (including getting the end time) is 70371us for 1,000,000 iterations of the timeloop.
Now, please check my logic, but for me this gives 14,210,399.1701 loop iterations per second, and therefore a CPU frequency (assuming 3 clock cycles per loop) of 42.63119751 MHz.
This test was done with clk_h = clk_p / 2 and not 3 as in my original post. Repeating with clk_h = clk_p / 3, the loop takes 105556us.
Best regards
Chris


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What you want is to prove that CPU is running @454MHz, right?
If so, then we can make things easier, just set H_CLK to be 1/3 CLK_P, and use H_CLK counter to compare with microsecond counter in the same DIG module, or even with RTC timer then we can get it.
I am afraid previous approach is not very accurate, sorry for that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've tried your request with the following code:
volatile unsigned int start = HW_DIGCTL_HCLKCOUNT_RD();
volatile unsigned int count = 0;
delay_us(1000000); // 1 second delay
count = HW_DIGCTL_HCLKCOUNT_RD() - start;
This gives me a count of 151596528 which, if clk_h = clk_p / 3, gives clk_p = (approximately) 454.789584 MHz, suggesting clk_p is set correctly.
The question remains as to why my code runs as expected under Linux, but much slower on the bare chip?
For example, if I get my application running on Linux to flash an LED by writing to /sys/class/gpiox/gpio as it executes, it does so with a 0.04s period.
Running the same application directly on the chip, and flashing the LED by writing to HW_PINCTRL_DOUT2_xxx, does so with a period of 2.5s.
I can't explain why the performance would be so different, and I desperately need to get this to work.
An ideas?
Thanks
Chris


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI, Chris
Sorry that I didn't have a i.mx233 board here to do some investigation for you, so what I can do is providing my thoughts, it does not make sense that bare chip running slow than Linux OS. So next step, I think we can dump all registes value in clkctrl module, then compare it on Linux and on bare chip, hope there is something wrong with the clk module config.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've dumped out the CLKCTRL registers, even dumped out POWER and EMI registers to see what was going on.
There is no real difference between the Linux settings and mine.
I used SJTAG to dump the values out of my application, and noticed that I-Cache and D-Cache were disabled!
I was not configuring the MMU correctly (actually I was bypassing it). Today I corrected that, and now the code runs at the speed I'd expect.
Thank you for your assistance, you were most helpful!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Proving 454MHz is my first test in investigating my execution speed problem.
My issue is that my project runs somewhere between 5 and 10 times faster when running on top of Linux, than when running on the bare chip - the opposite of what should happen.
Even a simple test like flashing the LED in a loop shows the same speed difference. The bare chip should be much faster!
Obviously I suspect my chip initialisation, and the first thing to look at is the CPU frequency. So far it looks slow.
I will try your latest suggestion, and report back!
Chris
--
Sent while on the move...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the advice.
As far as I can tell by looking at the code, bit 20 (AUTO_SLOW_MODE) of HW_CLKCTRL_HBUS is 0 and thus off (the default). There are other auto-slow modes available, so I will specifically set them all to OFF and report back to you later.
For info, I was specifically using the HCLK counter to measure the frequency of clk_h, so I could indirectly measure clk_p ( clk_h = clk_p / 3 ). As DIGCTL_MICROSECONDS counter is driven directly from the 24MHz, I can't use that to measure clk_p.
I've made a further discovery since my original post. I wrote a loop in assembly that toggled the LED pin, instead of using the HCLK counter, so that the execution speed of the ARM could be measured directly. Based on an average of 1.5 CPI for an ARM9 core, there should be 9 clock cycles between each change of LED state. Again measurements do not match the theory and, confusingly, if I double the clk_p-to-clk_h divide ratio from 3 to 6, the LED flash period DOUBLES !!
I was not expecting this. Why should clk_h affect the execution speed of a tight loop of ARM assembly code???
Really confused now.
Chris
