Nick
Interrupt latency is always 12 cycles for the Cortex-M7
Consider this code:
static __interrupt void _PIT_Interrupt(void)
{
TOGGLE_TEST_OUTPUT();
TOGGLE_TEST_OUTPUT();
__asm("nop");
... 256 times in all
__asm("nop");
TOGGLE_TEST_OUTPUT();
TOGGLE_TEST_OUTPUT();
WRITE_ONE_TO_CLEAR(PIT_TFLG0, PIT_TFLG_TIF); // clear pending interrupts
TOGGLE_TEST_OUTPUT();
TOGGLE_TEST_OUTPUT();
_SYNCHRONISATION_DATA_BARRIER();
TOGGLE_TEST_OUTPUT();
TOGGLE_TEST_OUTPUT();
}
which coverts to assembler as
LDR.N R3, [PC, #0x1fc]
MOVS R2, #8
STR R2, [R3]
STR R2, [R3]
NOP x 256
..
LDR.N R1, [PC, #0x14]
MOVS R0,#1
STR R2, [R3]
STR R2, [R3]
STR R0, [R1]
STR R2, [R3]
STR R2, [R3]
DSB
STR R2, [R3]
STR R2, [R3]
BX LR
Since the PIT can output its trigger on PIT_TRIGGERxx this gives a good method to measure the behavior (600MHz operation with 150MHz IPG and 75MHz PER clock - code and interrupt vectors in ITC):

A is the PIT_TRIGGERxx output pulse when the PIT fires
A-B is about 65ns
B-C is about 56ns
C-D about 152ns
D-E about 56ns
E-F about 160ns
F-G about 56ns
G-H about 64ns
H to I about 56ns
which gives some interesting insight in to the operation.
First of all it takes about 56ns to toggle an output (maximum toggle rate 18MHz - I have the GPIO in slow mode and not fast mode here) so one can reduce the latency A-B by this value to get the time to enter the interrupt to be about 10ns (even with measurement resolution it is close to the theoretical value).
C-D has 256 nops inserted (each taking 1/2 a clock and so - at 600MHz - 210ns since the Cortex-M7 executes 2 NOPs each cycle (1.2G NOPs /s))
E-F is the first big surprise since the harmless looking STR R0, [R1] takes 160ns to complete (whereby we have already seen that writes to the GPIO takes about 56ns each). This suggests that the write to the PIT's flag register is in fact rather slow (taking as long as about 200 instructions).
Finally note that the DSB adds a delay since it is not allowing the instruction processing to continue until previous data writes have been synchronised (completed). If this is not done the interrupt tends to re-enter as the flag is in fact still pending when the BX LR is executed.
This shows that the instruction processing can be fast - 1.2G instructions per second, but also shows that peripheral accesses can be SLOW and these are not to be underestimated in peripheral interrupts since they can greatly reduce the practical speed that such peripheral interrupts can operate at.
To you question - what can cause jitter?
Firstly the above has no jitter when:
- there is no higher priority interrupt either operating when the PIT trigger or pre-empting it when its starts
- the global interrupts are not masked when the PIT fires
Therefore, to ensure low latency make sure that global interrupts are never masked (use interrupt levels instead as global masking to protect critical regions in code) and give the critical interrupt the highest priority so that it can always pre-empt other interrupts. The worst case would be that it needs to wait for the first interrupt to push its registers before it is taken (2 x normal latency of about 20ns).
Also avoid use of floating point registers in interrupt routines if possible otherwise there is additional FPU register pushing needed.
The above is in idea conditions - no caching needed and zero wait state memory.
Where it can go quite horribly wrong is when:
1- Interrupt vectors are not stored in ITC but in a different memory. Then that memory's wait states kick in and the latency multiplies.
2. The stack in not in DTC and so each register push takes longer due to that memory's speed.
3- Interrupt vectors are in QSPI flash - if they are in cache it is fast (same as in ITC) but if they are not a QSPI flash read bust is needed to load them and BIG additional delays can result.
4 - same for the interrupt routine itself. If it is not in ITC it will be fast if it happens to be in cache but each time it is kicked out of cache due to other code being loaded it has to be re-fetched and thus BIG potential jitter.
General note: a lot of peripherals have internal triggers (generating DMA or interrupts) that can be connected to XBAR outputs for visibility on pin outputs (see https://www.youtube.com/watch?v=zNWIG-O7ZW0&list=PLWKlVb_MqDQEOCnsNOJO8gd3jDCwiyKKe&index=6). These are useful for timing measurements since the REAL interrupt reference in time is visible/measurable. Just measuring GPIOs in the interrupt itself will miss delays due to waiting for other interrupts and such.
Regards
Mark
[uTasker project developer for Kinetis and i.MX RT]
Contact me by personal message or on the uTasker web site to discuss professional training or product development requirements