We recommend that to have such tests in kernel space or in TFA/uboot
We tried the following test code in TFA(EL3), seems the average time is several ns on 1046ardb for a read of cntvct_el0
uint64_t tsc1, tsc2, tsc;
asm volatile("mrs %0, cntvct_el0" : "=r" (tsc1));
for (int i = 0; i < 10000; i++) {
asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
}
asm volatile("mrs %0, cntvct_el0" : "=r" (tsc2));
printf("sub: %lu\n", tsc2 -tsc1);
The clock_gettime and such precise API are implemented in kernel with also a read to the cntvct_el0.