Separate task todo average - very slow

TVNAIDU · ‎11-14-2009

I have separate task, where it polls for a flag which was set by 1ms PIT, every 1ms, when this flag gets set, I have separate task where I read 24 ADC readings and I do store those values into accumulators by doing double those values - I can see very slow (wholethings takes more than 1ms), any idea why multiplication takes more time in 52233 demo board?. I have 24 LEDs and I have to read 24 ADC inputs, also I have to store those values by doing double.

Select 1st LED using GPIO

val1 = Read ADC input

buf1 += val1 * val1;
accu1 += volt1 * val1;

........

.........

Select 24 LED using GPIO

val24 = Read ADC input

buf24 += val24 * val24;
accu24 += volt24 * val24;

vier_kuifjes · ‎11-14-2009

I think that everything shown here can just as well be done with integer arithmetic. There's no need to use floating point if it is not absolutely necessary. It just slows things down a lot.

TVNAIDU · ‎11-15-2009

I declared most of the variables (buffers, accumulators and final currents) as globals instead local variables, globals are only visible with debugger for debugging, does it makes any difference for CPU performance?. global takes more memory than locals, since globals retain its value throughout the variable life?.

TVNAIDU · ‎11-15-2009

Marc:

I replaced all Floats to int, I can see some improvement, but still I can see some difference between when I comment all those ADC readings and math for accumulators and not comment.

kef · ‎11-14-2009

Pseudocode doesn't work! You should specify at least variable types? Aren't they all floating point? If they are, then 120 FP operations (+,* and int to FP) in 1ms is less than 8.3us for average oeration. Isn't that OK?

TVNAIDU · ‎11-14-2009

Yes, they are ll Floating point types. does it take 8.3us for each FP (+,*)?. I have 48 of those plus 8 more total additions in addition to select GPIO for each LED and reading each sample (56 * 8.3us = 464.8 us, almost half of 1ms). if I comment this below code, I have print for half-second, I can see my print coming very faster, if I enable below code, print is very very slow. this is my separate task, the first flag (get_curr_read_flag = TRUE) set by PIT 1ms timer, here it loks for that flag TRUE, then it reset to FALSE, then read one set of ADC samples and do math and store them in accumulators, then I have local counter here for 500 for Half-sec (measure_power_factor), when this reaches 500 (SAMPLES/2), I reset this to zero and do power calculations. this is my code. I have print all the way below for half-sec, if I comment all these FP (+,*), I get that orint very quickly, if I enable these, it is very slow, I gets print for every 5 Seconds.

#define SAMPLES 1000

TK_ENTRY(tk_adc_readings)
{
int err;
int temp;
int i;

float curr_buf1 =0;
float curr_buf2 =0;
float curr_buf3 =0;
float curr_buf4 =0;
float curr_buf5 =0;
float curr_buf6 =0;
float curr_buf7 =0;
float curr_buf8 =0;
float curr_buf9 =0;
float curr_buf10 =0;
float curr_buf11 =0;
float curr_buf12 =0;
float curr_buf13 =0;
float curr_buf14 =0;
float curr_buf15 =0;
float curr_buf16 =0;
float curr_buf17 =0;
float curr_buf18 =0;
float curr_buf19 =0;
float curr_buf20 =0;
float curr_buf21 =0;
float curr_buf22 =0;
float curr_buf23 =0;
float curr_buf24 =0;
float Board_Total =0;

while (!iniche_net_ready)
TK_SLEEP(1);

err = eg_init();

if( err == SUCCESS )
{
exit_hook(eg_cleanup);
}
else
{
dtrap(); // eg_init() shouldn't ever fail
}

for (;
{

//this get_curr_read_flag seyt by PIT 1ms timer, reset here

if (get_curr_read_flag == TRUE) {
   get_curr_read_flag = FALSE;

   //comment this temporarily - added this function code below
   //get_readings();

   VOLT_READ1;
   for(i=0; i < 0x22; i++)
   volt_buf1 = (int)((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 407 ) / 2048;
   volt1_accu += volt_buf1 * volt_buf1;

   CURR_READ1;
   for(i=0; i < 0x22; i++)
   curr_buf1 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
   curr1_accu += curr_buf1 * curr_buf1;
   power1_accu += volt_buf1 * curr_buf1;

   CURR_READ2;
   for(i=0; i < 0x22; i++)
   curr_buf2 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
    curr2_accu += curr_buf2 * curr_buf2;
   power2_accu += volt_buf1 * curr_buf2;

    CURR_READ3;
   for(i=0; i < 0x22; i++)
   curr_buf3 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
    curr3_accu += curr_buf3 * curr_buf3;
   power3_accu += volt_buf1 * curr_buf3;

     CURR_READ4;
   for(i=0; i < 0x22; i++)
   curr_buf4 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
      curr4_accu += curr_buf4 * curr_buf4;
   power4_accu += volt_buf1 * curr_buf4;

       CURR_READ5;
   for(i=0; i < 0x22; i++)
   curr_buf5 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr5_accu += curr_buf5 * curr_buf5;
   power5_accu += volt_buf1 * curr_buf5;

       CURR_READ6;
   for(i=0; i < 0x22; i++)
   curr_buf6 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr6_accu += curr_buf6 * curr_buf6;
   power6_accu += volt_buf1 * curr_buf6;

       CURR_READ7;
   for(i=0; i < 0x22; i++)
   curr_buf7 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr7_accu += curr_buf7 * curr_buf7;
   power7_accu += volt_buf1 * curr_buf7;

       CURR_READ8;
   for(i=0; i < 0x22; i++)
   curr_buf8 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr8_accu += curr_buf8 * curr_buf8;
   power8_accu += volt_buf1 * curr_buf8;

curr_18_BT = curr_buf1+curr_buf2 +curr_buf3+ curr_buf4 + curr_buf5 + curr_buf6 + curr_buf7 + curr_buf8;
curr_18_AT += curr_18_BT * curr_18_BT;

   VOLT_READ2;
   for(i=0; i < 0x22; i++)
   volt_buf2 = (int)((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 407 ) / 2048;
       volt2_accu += volt_buf2 * volt_buf2;

       CURR_READ9;
   for(i=0; i < 0x22; i++)
   curr_buf9 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr9_accu += curr_buf9 * curr_buf9;
   power9_accu += volt_buf2 * curr_buf9;

       CURR_READ10;
   for(i=0; i < 0x22; i++)
   curr_buf10 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr10_accu += curr_buf10 * curr_buf10;
   power10_accu += volt_buf2 * curr_buf10;

    CURR_READ11;
   for(i=0; i < 0x22; i++)
   curr_buf11 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr11_accu += curr_buf11 * curr_buf11;
   power11_accu += volt_buf2 * curr_buf11;

       CURR_READ12;
   for(i=0; i < 0x22; i++)
   curr_buf12 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr12_accu += curr_buf12 * curr_buf12;
   power12_accu += volt_buf2 * curr_buf12;

       CURR_READ13;
   for(i=0; i < 0x22; i++)
   curr_buf13 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr13_accu += curr_buf13 * curr_buf13;
   power13_accu += volt_buf2 * curr_buf13;

       CURR_READ14;
   for(i=0; i < 0x22; i++)
   curr_buf14 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr14_accu += curr_buf14 * curr_buf14;
   power14_accu += volt_buf2 * curr_buf14;

       CURR_READ15;
   for(i=0; i < 0x22; i++)
   curr_buf15 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr15_accu += curr_buf15 * curr_buf15;
   power15_accu += volt_buf2 * curr_buf15;

       CURR_READ16;
   for(i=0; i < 0x22; i++)
   curr_buf16 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr16_accu += curr_buf16 * curr_buf16;
   power16_accu += volt_buf2 * curr_buf16;

curr_9_16_BT = curr_buf9+curr_buf10 +curr_buf11+ curr_buf12 + curr_buf13 + curr_buf14 + curr_buf15 + curr_buf16;
curr_9_16_AT += curr_9_16_BT * curr_9_16_BT;

   VOLT_READ3;
   for(i=0; i < 0x22; i++)
   volt_buf3 =(int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 407 ) / 2048;
      volt3_accu += volt_buf3 * volt_buf3;

      CURR_READ17;
   for(i=0; i < 0x22; i++)
   curr_buf17 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
      curr17_accu += curr_buf17 * curr_buf17;
   power17_accu += volt_buf3 * curr_buf17;

      CURR_READ18;
   for(i=0; i < 0x22; i++)
   curr_buf18 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
      curr18_accu += curr_buf18 * curr_buf18;
   power18_accu += volt_buf3 * curr_buf18;

      CURR_READ19;
   for(i=0; i < 0x22; i++)
   curr_buf19 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
      curr19_accu += curr_buf19 * curr_buf19;
   power19_accu += volt_buf3 * curr_buf19;

      CURR_READ20;
   for(i=0; i < 0x22; i++)
   curr_buf20 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
       curr20_accu += curr_buf20 * curr_buf20;
   power20_accu += volt_buf3 * curr_buf20;

      CURR_READ21;
   for(i=0; i < 0x22; i++)
   curr_buf21 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
      curr21_accu += curr_buf21 * curr_buf21;
   power21_accu += volt_buf3 * curr_buf21;

      CURR_READ22;
   for(i=0; i < 0x22; i++)
   curr_buf22 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
      curr22_accu += curr_buf22 * curr_buf22;
   power22_accu += volt_buf3 * curr_buf22;

      CURR_READ23;
   for(i=0; i < 0x22; i++)
   curr_buf23 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
      curr23_accu += curr_buf23 * curr_buf23;
   power23_accu += volt_buf3 * curr_buf23;

      CURR_READ24;
   for(i=0; i < 0x22; i++)
   curr_buf24 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;
      curr24_accu += curr_buf24 * curr_buf24;
   power24_accu += volt_buf3 * curr_buf24;

   curr_17_24_BT = curr_buf17+curr_buf18+curr_buf19+curr_buf20+curr_buf21+curr_buf22+curr_buf23+curr_buf24;
   Board_Total = curr_18_BT + curr_9_16_BT + curr_17_24_BT;
   Board_Total_AT += Board_Total * Board_Total;
   curr_17_24_AT += curr_17_24_BT * curr_17_24_BT;

              measure_power_factor ++;
  }

//for every Half-second calculate powers and reset this flag

         if (measure_power_factor == (SAMPLES/2)) {
         //calculate RMS currents and voltages and powers
  //calc_powers();
             measure_power_factor = 0;
        printf("********* Half-second-print ****************************\n");

        }

//eg_loop(); // will block if anything needs to be done

tk_yield(); // give up CPU in case it didn't block

eg_wakes++;

if (net_system_exit)
break;
}

TK_RETURN_OK();
}

kef · ‎11-14-2009

I don't know how long should take FP operations on your chip, but few microseconds don't sound too bad. You say you have 48 FP operations. Let's count them

for(i=0; i < 0x22; i++)
volt_buf1 = (int)((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 407 ) / 2048;

^^ volt_buf1 is not defined. I guess it is also FP. If so then you have 34 integer->float conversions.

for(i=0; i < 0x22; i++)
curr_buf1 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;

^^ 34 conversions int->float . 34+34=68

curr1_accu += curr_buf1 * curr_buf1;
power1_accu += volt_buf1 * curr_buf1;

68 + 4 = 72

for(i=0; i < 0x22; i++)
curr_buf2 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;

72 + 34 = 106

... and so on. I found over 1000 FP operations. To complete your routine in time (faster than 1ms), average FP op. should take less than 1us. Without math coprocessor that would require something like 200MHz CPU.

TVNAIDU · ‎11-14-2009

Sorry for not much info on CPU. I am using CF 52233 which is running at 60MHz, all are defined as floats only, actually I don't need to convert that to int, somehow I forgot to take out. This don't have any math co-processor, atleast I want to just read ADC and accumulate those values, I can have another task to do power calculations, atleast I want to samples and store, can't I do atleast with in 1ms?.

kef · ‎11-14-2009

Why are you repeating something like volt_buf1 = (int).... 34 times:

for(i=0; i < 0x22; i++)
volt_buf1 = (int)((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 407 ) / 2048;

TVNAIDU · ‎11-14-2009

I have to take 24 ADC readings from 24 devices and also 3 voltages across each board (eight devices per board), 24 instantaneous current readings and 3 voltage readings bu selecting each device bu CURR_READ1, CURR_READ2......, only one ADC pin, but select different device and read ADC, totally 27 ADC reads and after every read, accumulate those readings, when I reach half of my samples, then I take all these accumulators and do average and do SQRT, to get the value. that part I am doing when it reaches half of that count, right now I commented that in next if condition (power_calc).

select each device and give little delay (for (i=0; i < 0x22; i++) and then read ADC, then accumulate. I am accumulating its square andthen multiple curr reading withthat boardvoltage reading.

kef · ‎11-14-2009

for(i=0; i < 0x22; i++)
volt_buf1 = (int)((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 407 ) / 2048;

Above is do "read ADRSLT, do simple math, convert to float and save to volt_buf1" 34 times.

Small delay and then "read ADRSLT, do simple math, convert to float and save to volt_buf1" once would be

for(i=0; i < 0x22; i++) ;
volt_buf1 = (int)((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 407 ) / 2048;

or

for(i=0; i < 0x22; i++) {}
volt_buf1 = (int)((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 407 ) / 2048;

TVNAIDU · ‎11-15-2009

here even if takes more time for power calculations it is ok, but I want samples get taken within 1ms, otherwise if those gets delayed, then I won't get accuracy?. may be I should split that work into two taskes?. even if I divide into two, then syncing those two is issue, that is why I keep eberything in One task, read those samples and calculate for every half-second, the wholethings gets processed within 1ms, here total slot is 1ms, R - READ ADC and do Math, S - store / accumulate, C-calculate in last one, the first 499 samples read and store only, after last one read (500th one) do processing, that is where those gets averaged and do sqaure-rooted and print.

first sample:

|<-------------------------------------- 1ms --------------------------------------->|

|R1S1R2S2R3S3...........................................................................................R24S24->|

.....

......

500th sample:

|<-------------------------------------- 1ms --------------------------------------->|

|R1S1R2S2R3S3.................................................................R24S24. C1C2C2......C24->|

the flag set by 1ms PIT, when this task sees flag set, then it clears flag and read and do math and store those sampels.

TVNAIDU · ‎11-15-2009

I am printing 48 variables (24 final rms currents and 24 powers on console for every Half-second on console (UART), does prints slowdown any CPU?. I am thinking of keeping these prints for end user to understand what are the values, does prints degrades CPU performance?.

TVNAIDU · ‎11-15-2009

Kef:

how did you got that 34?. Is that 0x22, the for loop I am using just for some delay between selecting the device using macro (CURR_READ1) and reading ADC input. the for loop (for i=0; i < 0x22; i++); is just delay purpose only. my actual readings are 27 (24 CURR buf readings and 3 volt buff readings, then I have accumulators for curr and power, those just doubles and save, then I am taking average when it reaches half-sec (when total count reaches SAMPLES/2), then I take average and do sqrt to get the value.

that for loop is for just delay only. I have 27 ADC readings, but for each reading few +,* .

kef · ‎11-15-2009

TVNAIDU,

I'm surprised you don't know whats 0x in C and what's the difference between

for(i=0; i < 0x22; i++) and for(i=0; i < 0x22; i++);

22 - decimal number

022 - octal, equals to 10 in decimal numeral system

0x22 - hexadecimal, equals 34 in decimal numeral system . See http://en.wikipedia.org/wiki/Hexadecimal

Since for-loops in your code were for delays, just add {} after each for loop and see how that improves performance.

And please get some decent C book.

TVNAIDU · ‎11-15-2009

It is my fault putting not semicolon at the end for the for loop, because of that ADC readings happens 34 times, I corrected and I can see lot of improvement now. But for every half-sec I am calculating powers (average those accumulators and do square-root) and freeing all accumulators, also for Minute (120 count of this half-sec power calc routine), I am printing those power values on screen. but I am not getting accurately for every Minute. I can see first one after 1Min + 10 sec, second print at 1 Min + 30 sec, next one after 1 Min + 20 sec like that. in second power calc routine, I am doing avaeraging and square-rooting for 28 readings, may be that average and square-root is causing the delay?. I need to get taken all samples and get calculated within that 1ms, looklike it is getting stretched.

ipa · ‎11-16-2009

Hi TVNAIDU,

I think your algorithm is too crowded - for your job to measure a number of voltages and currents expressions like:

curr_buf19 = (int) ((((MCF_ADC_ADRSLT7&0x7FF8)>>3) - 2048) * 100) / 2048;

repeated 0x22 times makes no sense - what is expected in your case is to sample all waveforms at regular time intervals over a period and apply the above relation only for one sample and save this in a buffer at specified index. The rest of computing should be done over that buffer.

It would be wise to develop the algorithm first for only one voltage and one current; when you get

good results you can extend to all channels.

One word of caution: the number of samples taken over of period are related to accuracy of measurement - if you sample currents from various devices maybe you must take into account the crest factor of that signals.

Regards,

Ipa

TVNAIDU · ‎11-19-2009

I commented all those conversions and just read ADC and store into buffers and also do that double and multiply current with voltage for power and store into accumulators, I am taking 420 samples in 420ms, when I close to max samples, the power accumulator values is exceeded the four bytes (each buffer is an integer, it stores upto 4 bytes, but for power 4096 *4096 * 420 = 0x1a4000000, usually it won’t goup to peak value, but in worst case I just took it.

TVNAIDU · ‎11-17-2009

ipa:

I have to do that conversion immediately, when I read ADC, I get ADC values between 0 and 4096, but I have to convert those values like sine wave (0 means -100, 2048 means center point for sinewave which is zero and 4096 means +100 in current case, in voltage case those values are between -407 and +407), if I convert right there I can make it straight, since I have to take those samples continuosly for 420 times in 420ms, then I do power caalculations for all those accumulators. If I store directly without converting, then I will loose the sharpness.

TVNAIDU · ‎11-14-2009

Thanks Marc and kef. Let me change all float variables to int and try.

TVNAIDU · ‎11-14-2009

currently I just commented some accumulation math and for loop (basically I gave for loop for just delay between select device using GPIO and read ADC input) to see whether it executes faster or not, when I enable just for loop, I can see slowdown. may be I should declare a function for that delay and call that function everywhere where I have for loop?. use the below function call instead for() loop 27 times?

void small_delay()

{

for(i=0; i < 0x22; i++){}

}

instead

for(i=0; i < 0x22; i++){}

then call small_delay() whereever I have for loop?.

Separate task todo average - very slow

Separate task todo average - very slow

General