We're running Linux on an i.MX53 board where the CPU is running at 800MHz.
With some kernels this reports in the Linux startup:
Calibrating delay loop... 531.66 BogoMIPS (lpj=2658304)
With other ones it reports:
Calibrating delay loop... 795.44 BogoMIPS (lpj=3977216)
After hours I managed to track down the difference. The BogoMIPS are calculated from the execution time of a "__delay()" function in arch/arm/lib/delay.S, which looks like this:
c01415e8: e2500001 subs r0, r0, #1
c01415ec: 8afffffd bhi c01415fc <__delay>
c01415f0: e1a0f00e mov pc, lr
The above version of that function runs at "800 BogoMIPS", which means it is executing the "subs" and the "bhi" in the same clock cycle and managing one loop per clock. Impressive!
If the code is instead located one 32-bit word higher or lower in memory (so starting at c01415e4), it now runs at "533 BogoMIPS" which means it is taking 3 clocks to run the loop twice. Something is preventing full-speed execution.
There is lots of information in the ARM CPU manuals detailing instruction timing, and when things can and can't be double-issued, but there's nothing that I can see that explains different execution speed for different four-byte alignment of the instructions.
Does anyone know of a reference in the manuals that explains this?
This is measuring "Bogus MIPS", and it normally shouldn't matter, but we're passing the "lpj" (Loops per Jiffy) on the kernel command line to make it boot faster, and so we have to have this the same every time we build a kernel.