Thank you @danielchen
These questions are interesting because they're multi-dimensional and dependent on how much control you want over the process.
I just did some research and the answer isn't obvious - I would probably recommend building a test application and trying out different methods to find out which method is fastest as well as the most accurate (see below).
The approach *I* would try, after characterizing (timing) the two examples you listed would be to break "scaleFactor" into high and low 32bit parts, do the floating multiplication on everything and, finally, add the products together after they're converted from floats to 64bit integers.
float scaleFactorHigh = (float)(scaleFactor >> 32);
float scaleFactorLow = (float)(scaleFactor & 0xFFFFFF);
float abRatio = (float)(A / B);
uint64_t result = ((uint64_t)(scaleFactorHigh * abRatio) << 32) +
(uint64_t)(scaleFactorLow * abRatio));
The big issue that I can see with this approach is what version of the M4 "VCT" instruction (convert float to int) does the compiler use? Straight "VCT" truncates the product while "VCTR" rounds it (which is what you want in this case). That could introduce an error. I have made all types float until calculating "result" because there seems to be an ineffiiciency multiplying floats and integers together.
If speed was of the absolute essence along with absolutely accurate values, I would write the above statements in assembler making sure that the errors that occur in the conversion from integers to floats is minimized and no clock cycle costrly instruction cycles are used.