Hi, @Maciek
I apologize for my late response. I have investigated your model and I am not quite sure how did you obtain those results displayed on the scope from the profiler block. What I have done is to create a new model and use FreeMASTER to check out the results that profiler block has returned, like in the image below:

Comparing the results between profiler block and PIL profiler functionality, you are right. There are different values because PIL profiler introduces a lot of overhead and I had already open a ticket trying to resolve it. For the moment, I suggest to use profiler blocks because they are more accurate and also I might suggest to modify the compiler flag for optimization from -O1 to -O0 for better accuracy when measuring the time execution of a portion of code.
When using the profiler block, the generated code for your subsystem/model will be put inside the function calls that will get tick values from LPIT timer.

But when using PIL profiling functionality, there are a lot of calls until it gets to use the LPIT timer to get the ticks, because the code for PIL is generated differently than an usual model building. For example, below you can see the functions used to profile the code in PIL for your model that you`ve sent me and those profilerStart and profilerEnd functions will call another functions in cascade to actually measure something and then uploading the results in Simulink. So we can observe the introduced overhead from this cascading function calls.

Hope this helps! If you have questions, feel free to discuss.
Best regards,
Stefan.