Hi Dong,
I found a solution to this problem and I am going to share it here. The problem I saw in my setup is that when the algorithm is activated and the motor just starts turning (typically using an open loop control strategy), the error angle (which is the arctan of the ratio between back-EMF in the gamma and delta axis) is really unreliable because the back-EMF on the delta axis is still approximately null (and it is at the denominator of the back-EMF ratio!). This leads to an error angle that saturates randomly. Since this is the input to the angle tracking observer, the PI controller therein integrates a spurious estimated speed in the first 50 ms let's say. As a consequence the downstream integrator that outputs the estimate angle also provides a spurious angle. When then the back-EMF on the delta axis increases and the angle error becomes meaningful, it is too late.. the whole algorithm (back-EMF observer + ATO) is gone to a wrong stable state, with an estimated angle shifted by 180 deg.
See from this screenshot what happens:

So my solution was, to force the error angle to zero as long as the magnitude of the back-EMF on delta axis is lower than some threshold (in my case a threshold of 750mV works fine), in order to avoid the spurious behaviour that you can see during the first 50 ms of the recording I took using FreeMASTER.
I can't say why engineers at NXP did not have this sort of problem with this algorithm. Maybe there is a cleaner solution to this, perhaps in the way the algorithm is used? But I could not find it in their reference projects. So this is my workaround. An even simpler solution would be to just activate the back-EMF observer after a delay of 50 ms in my case. This algorithm should not be executed if the motor is stationary, which is basically the case for this initial short time elapse.
dumitrupopa marekmusak MatejPacha do you think my workaround makes sense? Did you ever had this problem?
Luca