I believe that I have resolved this issue. Thank you for the pointers from everyone above. I would appreciate a peer review of this information below, as I think it may be useful for other people.
This is based on experiment and data gleaned from ARM via various back doors.
The original problem was that two timers were enabled to free run and connected to the same hardware signal. When this signal changed state an edge-triggered capture occurred which generated two simultaneous interrupts, by design.
The NVIC interrupt chaining was expected to resolve this and run the two ISR back to back
In the application occasional samples were “missed”. This was observed because application code would take the last read value and subtract it from the current (most recent) read value of the capture event and output this to the serial port for debugging. With fast occurring interrupts this number would flick between two values, one double the other. This suggests that the code doing the calculation was not seeing every interrupt new sample (a flag set in the ISR) due to multiple interrupts. This was expected as the purpose of the testing was to find the limit of in-time execution for risk characterisation.
However, to check this an ISR incremented counter was added to detect interrupts between calcuations - and this was not incrementing more than once between reads – and that suggested an entire interrupt was missing.
Further debug using the Keil IDE in break mode and stepping the application showed that TMR32, interrupt 0x22 was occurring with TMR16 pending, interrupt 0x20. The expected behaviour is that interrupt end chaining would see 0x20 as pending and switch to that at the end (not during) 0x22. However, stepping through the code showed that the execution actually returned to the the main application (thread mode) rather than deal with the pending and enabled 0x20. After several hundred lines of code stepping the interrupt finally re-fired and entered 0x20 to handle it.
As this is not the behaviour defined in the manual, I created this thread and did more digging.
--------------------------------------------------------------------------------------------------------------------------------------------------------------
The stated behaviour is as follows, in my own phrasing with the aim to clarify the behaviour.
The M0+ core has 4 interrupt levels. If an interrupt occurs and is enabled in one of these levels then thread mode is interrupted and exception processing begins.
The highest priority (aka lowest number) interrupt level begins execution. If another higher priority interrupt occurs and is enabled then exception processing moves to service that , leaving the previous routine to be completed, and when the new interrupt completes processing returns to the first one, and then back to thread mode eventually.
Within each interrupt level there can be many sources, and each of the four levels is therefore known as a group. Within each group each interrupt source has its own number, assigned by the vendor for the function such that a natural priority exists – the smaller the interrupt source number the more priority it has for processing.
In the case of two interrupts occurring at the same level (aka within the same group) the NVIC handles the highest priority first. In my case if 0x22 and 0x20 occur together in group 0 then 0x20 will be processed in preference to 0x22.
A feature of the NVIC addresses multiple interrupts active within a group. Because entering and leaving thread mode is inefficient the ARM design seeks to avoid this overhead of unstacking and then restacking to leave and then return to the same point by tail chaining.
In this process, as the interrupt in hand finishes the NVIC checks for other interrupts pending within the group. If there is one (with enough priority – this is important) then processing switches to it directly before returning to the original thread. At the end of that interrupt’s handing, and assuming no further (suitable) interrupts occur then processing returns to thread mode.
The subtle, and critical statement is about the priority of the pending interrupt in the group that could cause tail-chaining. This must be a higher priority that the completing interrupt for the process to occur. The reason for this design is not explained by Arm, but is obvious after some thought:
Imagine 4 interrupts in a group, 1 – 4. 1 is the highest and 4 the lowest priority based on their numbers.
Assume 2 is in process when 4 then 1 occur. At the end of the current interrupt (2) what should happen next? Without prioritising the interrupts this would be 4 and then 1, but because 1 can’t be pending we will not discover it until we exit the interrupt routing and resume thread mode where the outstanding group interrupt would be re-triggered. This means the group priority is lost. Therefore to address this the chaining seems to be designed so that at the tail end, only a higher priority within the group can cause the tail chain, otherwise execution returns to thread mode.
This subtle feature is not mentioned by the majority of internet based advice. What it means is that two interrupts in the same group will not chain if the second pending one is lower priority (higher interrupt number) than the one currently being actioned.
---------------------------------------------------------------------------------------------------------------------------------------------------------
In the debug tracing I made the reverse situation seemed to occur with the higher interrupt number (0x22) taking priority over 0x20.
To verify what was happening I added code to put a single character into the serial port buffer (at 115200 baud 11 bits per character) with a while wait until the character was gone to ensure it was sent. I then called this with a different character at various points of the application. I sent 1 when the 0x20 interrupt was started, 2 for 0x22 starting and then 3 at the start of the calculation code, 4 at the end of this and then a <CR> at the end of main() so I got CRLF on my terminal at a known point.
Next I modified the Timer set up so the 0x20 interrupt was fired on the rising edge of the input and the 0x22 on the falling edge so creating a time of 100us between interrupts. Running this gave this typical output
| 123412 |
| 123412 |
| 123142 |
| 12 |
| 132412 |
And we can see that there is enough time between 1 and 2 for a return to thread mode from the last line, and in the hightlighted line we can see that the interrupt fires during the calculation, and in most lines the interrupt fires several times in each main() loop. When I reverse the order of the interrupts I saw 1 and 2 swap places as expected.
Next I reverted to using the same edge to trigger both Capture interrupts and this time the result was
| 312412 |
| 312412 |
| 312412 |
| 123412 |
| 123412 |
123412 |
1 and 2 are now never seperated and are always processed 1 and then 2 which is the order that the NVIC should use. There should be no chaining based on my interpretation of the pending priority, and proving that remains a task. Again we can see the calculation interrupted - and that can explain why there are two time periods counted with only one apparant interrupt call - it's a bug in my code, because the interrupt changes the values midway through the calculation. My mistake.
What is important, and news to me, is that the Keil debugger, in break mode, was showing information that was not synchronous - that is to say that the NVIC was shown as 0x22 in action and 0x20 pending which should cause a chain with no return to thread processing until the 0x20 exception was completed. This did not happen as noted originally.
To my mind the best mechanism for that is that the pending interrupt was from later in the execution sequence - i.e. in the future of the shown information and not in the same interrupt process of the one shown in action. So the 0x22 finished and returned to thread execution and then the 0x20 shown as pending (but not actually an interrupt yet in the trace) fired and retuned execution to the exception handler. This is shown as a sequence below:
INT 2 occurs - INT2 processing starts - INT 2 ENDs - Normal processing resumes - INT1 Fires - INT processing starts
but the break in the INT2 processing shows the pending INT1 from the end of the line above even though it has not happened yet.
It may be a race condition for the debugger vs the hardware interrupt handling.
I'd appreciate a second opinion - and if I'm correct I hope this is useful to someone else!
and if I'm not......