I think I know what's going on.
But first, "2 to 8 times per day" is what percentage (or parts-per-million) of the timer expiry. How infrequent is this?
And what is the expiry rate of the Slice Timers? Since I have your Slice Timer setup code I should be able to work that out myself, except there's nothing in the Slice Timer chapter or the Clocking chapter that actually says which clock goes into the Slice Timer! At least not where I can quickly find it. I expect to find a "clock tree diagram" in the manual like there is in most other ones, but in this manual - nada, zip.
So I'm guessing it is the XLB clock, probably running at 100MHz. Which for a 32-bit counter means it times out at 0.023Hz or every 42.9 seconds. So that means 2011 times per day. So "2 to 8 per day" is 0.1% to 0.4%.
How's my guessing going so far?
So here's what I think is happening. The timer expires, sets the "ST" bit and that works its way through the interrupt controller to be gated with the masks, combined with other interrupts and priority encoders to make the interrupt request. At an appropriate point in the CPU's execution, this causes the CPU to start the exception sequence, perform the "Space Read" on the bus and get the vector from the interrupt controller. Then it runs the service routine.
Meanwhile the hardware interrupt request is still active.
Most interrupt service routines read the status, clear the request, and then have a whole lot of work to do, accessing whatever data caused the interrupt, pushing it into ring buffers and triggering threads. So the code guarantees a minimum time between when the request is TOLD to go away and when the interrupt routine returns.
Then, the last thing (really, literally the LAST thing) your code in the service routine does is to launch a write back to the Slice Timer to make the interrupt request go away. On these CPUs, a write cycle to a peripheral can take 10 to 20 CPU clocks to execute. The Slice Timer looks to be in a "faster clock domain" than the usual slow peripherals, but it is going to take a while.
When the write executes, the hardware request is removed, and that removal has to ripple back through the interrupt controller and go away BEFORE the CPU has executed the RTE and restored its IPL.
If that hasn't happened yet you'll get THE Spurious Interrupt.
In the usual parlance, I think you need a "Write Barrier Instruction" after the write to "MCF_SLT_SSR0".
Or maybe swapping that instruction and "++counter_wraps" will make the problem go away.
Remember this CPU is "lightly superscalar" and can do some operations in parallel, or at least one-per-clock.
To try and prove this without having to wait for a day I'd suggest dropping the slice timer timeout so it expires at least 1000 times faster. Or more. That should make it possible to see if the problem is still happening in a few seconds.
So why is this intermittent? Why does the "declspec" make it happen more? I suspect that has something to do with either the instruction that was interrupted (unlikely) and/or the caches.
If the caches are flushed, then the ISR takes a while to read in the instructions before it can execute them. If the ISR code is still in the instruction cache when the interrupt happens, maybe it can get from the last instruction to the interrupt return faster, and fast enough to trip the hazard. The cache lines are flushed "randomly" so maybe about one in 500 times the cache line with the service routine got "lucky" and didn't get overwritten.
The execution speed of "++counter_wraps" will depend on whether that variable is still in the data cache or whether it has to be fetched from main RAM. Ditto "lucky cache preservation".
The "declspec" inserts an instruction that takes FOUR CPU clocks, and during that time the CPU can probably load up its instruction pipeline further than when that instruction isn't there. A "NOP" (6 clocks) or one or more "TPF" instructions might have the same effect.
I think the proper fix for your problem is to add a "NOP" after the write to MCF_SLT_SRR0, or to read back that register (to guarantee write completion). The syntax for that is usually just "MCF_SLT_SRR0;" as long as it is "volatile". Note that "NOP" isn't a "No Operation". "TPF" is "NOP". "NOP" is "Pipeline Flush".
Tom