My processor is running at 40MHz bus.


I'm working with an array of uint32_t and converting it to floats to another array.   The maximum size is 128 elements in each array.


I verified my application takes 6ms+ to loop and narrowed down the problem to the code below.  This code takes 6ms to run:


for ( i = 0; i < currentBufferSize && i < nSamples; i++ )


FsamplesA[i] = (float)samplesA[i]*3.3/4096.0 - fZeroCurrent;

FsamplesB[i] = (float)samplesB[i]*3.3/4096.0 - fZeroCurrent;

FsamplesC[i] = (float)samplesC[i]*3.3/4096.0 - fZeroCurrent;



So there is an assignment after a cast, multiplication, division and a simple offset subtraction.


I calculated it takes 624 instruction cycles to do the calcs and populate just a single float buffer element!


Does this seem right for a Cortex-M4F?


