Hello bigmac I tried those pragmas in another function (not a function used as task, because I have had no problems with them) that I was having trouble with, but didn't worked.
Pragmas tried:
#pragma NO_ENTRY
#pragma NO_EXIT
#pragma NO_FRAME
#pragma NO_RETURN
(All at once)
My main initialization function is called ezTasker_run, on main() I call it like this:
__asm JMP ezTasker_run;
trying to evade the stacking of the return address but even using those pragmas resulted in this function startup:
ezTasker_run:
00000012 A7FE [2] AIS #-2
... REST OF ASM
even calling ths function with normal:
ezTasker_run();
resulted in same AIS #-2 at function startup and that function call been done with standard JSR
So my only way to deal with is is doing a stack pointer reset when entering.
Any advice to make it work?
By putting those pragmas over a function only affect that function, or all functions starting on that line?
My RTOS Prototype is finished and haven't had problems with extra stacking inside functions, anyway I may encounter them in the future. Thanks a lot for the tips. Anyway I like having one task to handle everything about the same problem, not separate a problem resolution into multiple tasks if it's not needed.
I think my approach is more likely cooperative instead of preemptive, because even though I hame a contest switch, it only happens when task decides to do so, there's no timer interruption triggering it.
Hello CompilerGuru, I think I made a mess in the first post, but let me explain myself.
Every time a context switch is triggered (SWI) stack must be empty because my context switch only saves CPU registers except stack pointer, once I pulled and saved all registers from stack, the stack must be empty. Meaning there were no temp variables stacked and task will continue to run nicely next time it's given the CPU. If stack was found not empty, task is flagged as and never runs again just as a precautionary measure. That's why SWI must be triggered with an empty stack, so I don't have to worry about lost temp variables, any temp stack usage must be cleared before SWI.
An alternative is as you said, save also all stack, but I would prefer not to because implies a slower context switch and extra (task-specific amount) RAM for each task. Definitively would be less RAM than a preemptive RTOS because as you know, context switching will be done at controlled points, instead of preparing enough dedicated RAM to save stack if being caught in a deep stack level.
So far I have a working RTOS, tasks can trigger a context switch and can go to sleep some miliseconds and then wake up, algo have implemented a very simple event system, 8 max events per task (you guessed right, each one is a bit of a byte) and also a very useful event wait timeout feature. If any event happens within certain time, a "timeout event" is triggered, very nice for making SCI, SPI, IIC drivers.
My main motivation is the very easy to understand code that a full preemptive RTOS enables.
Here's my working classic 16x2 LCD Driver task:

sleep() macro is:
#define sleep(a) currentTask->sleep = a; __asm swi;
Needless to say, making currentTask->sleep = a assignment makes the scheduler to wait for that amount of milliseconds prior giving that task the CPU again, meanwhile all other tasks continue running.
There's a special "system task" and when it detects it's the only one in RUN state(not sleeping nor waiting for event) runs__asm wait; until a system tick interrupt happens. That system tick gets serviced every milisecond and as you guess, decrements the .wait for each task by one. This saves about 1/3 the power consumption by the microcontroller in my demo application., thats about 2.5 and 3 mA
Anyway there's still danger of trying to switch context with a not empty stack, and even though the context switch detects it and flags task, I'm eager to learn more about the compiler and ways to make it treat those specific functions differently, or any pseudo-instruction to safely empty stack (or course to call that pseudo-instruction prior __asm SWI; )
As for your r = (a * 1.2) + (b*2.2) + (c*3.3) I'm thilled about the heavy load that this expression would generate on a 8-bit CPU, but I think it may not happen as I follow this reasoning:
0. <- Prior entering we have an empty stack
1. r = (a * 1.2) + (b*2.2) + (c*3.3);<- Heavy temp variable stacking, lots of CPU time
2. <- Heavy processing finished, stack dumped, empty again
3. __asm swi;<- Enters with empy stack, no problem here !
But in this case we may be in trouble:
0. <- Prior entering we have an empty stack
1. r = (a * 1.2) + (b*2.2) + (c*3.3);<- Heavy temp variable stacking, lots of CPU time
2. <- Compiler saw that the last term is used in line 4, it decides not to dump it from stack
3. __asm swi;<- Enters with remaining variable in stack, STACK NOT EMPTY
4. r += (c*3.3)
Any advice would be very very appreciated. Thanks !