System running in AArch32 mode.
There are 8 cores in the system, in the test, 7 cores (core 1~7) will executive following
1) atomicInc a
2) read b
3) write c [cpuId]
4) goto 2
And at the same time and variable a, b and c  (c is an array) are in the same
At the same time, core 0 read the atomic variable a:
1) atumicRead a
2) if a != 7 goto 1
Test result core 0 fall into dead loop.
Investigation show the atomicInc is invoked 7 times and all of them returned.
The expected value is 7, but sometimes got 5.
And there following discoveries:
1) add some delay (for loop for certain times) in the stem 2 and 3, the issue gone.
2) move variable b to another cache line, the issue gone.
3) when the issue occur, add some delay in the 1st task's and 2nd task's step 2 and 3, the issue gone.
Does any one met similar issue before?