Hi guys.
I`m working on the embedded hypervisor targeting to p3041/p4080. The HV prototype starts
RTOS on just single CPU. I noticed that the latency of any user program running in the RTOS
under HV is about 3x higher than latency of the same program running under bare-metal RTOS.
For example, I wrote simple program like this
int main()
{
volatile int i = 1000000000;
struct timespec tm;
unsigned long long start, end;
printf("%s: start benchmarking\n", __FUNCTION__);
for (i = 0; i < 1000; i ++) {
clock_gettime(CLOCK_REALTIME, &tm);
start = tm.tv_sec * 1000000000 + tm.tv_nsec;
while (i > 0)
i --;
clock_gettime(CLOCK_REALTIME, &tm);
end = tm.tv_sec * 1000000000 + tm.tv_nsec;
printf("1000000000-- in %lld nanosecs\n", end - start);
}
return 0;
}
This does not cause any TLB misses or any other HVPRIV exceptions, so I expects that the times
reported by this program running on bare-metal RTOS shall be the same as times reported by the
same program on pv RTOS/my hv prototype. However I see that pv RTOS/hv shows 3x bigger latency
1500 nanoseconds in PV case vs 500 nanoseconds in bare-metal case.
What may be the reason of such bechavior ? I have decrementer interrupt routed to HV mode and then
forward it to the guest by hands, however decrementer is configured to fire 1000 times per second, and
HV adds not more than 2 bus cycles to the total decrementer interrupt processing time, so it will add
2000 bus cycles. There shall be other source of such big latency. Does anybody have any ideas on this ?