we have a long running loop that just counts towards a given value. This results in following assembler code:
With this loop we observe different execution times. Most of the time executing the loop takes 2 processor cycles, we guess that the branch is correctly predicted and an entry in the branch cache is available resulting in executing the branch instruction in zero cycles. For the first few execution cycles we need 3 processor cycles, we guess that the branch is correctly predicted but no branch cache entry is available. So far everything seems fine.
But sometimes we observe that the execution will continously take 3 processor cycles. The loop is running for quite some time and after an interrupt is handled the long execution times can occur. It seems like the branch cache is not generating an entry resulting in the BLS always taking one cycle.
Is there any known problem with branche cache or is there a state where the branch cache is not updating itself? We have the branch, data and instruction cache enabled and no flushes occur before the "error".
If anyone has an idea where the behavoir is coming from, we appreciate any help.