As converse said, Cortex M3/4 devices are more complex than simple 8-bit MCUs. Instruction execution time is not predictable to this degree.
> I used the ctimer1 to get delay,but I get same result.I set prescale value equal 180-1,matchvalue equal 1 to delay 1 millisecond,
I suppose you meant microsecond not millisecond.
Use a timer, toggle a GPIO pin in the timer interrupt, and measure the difference. Reduce the timer count to make up for the interrupt latency.
Higher priority interrupts could throw you off.
Do you have a good justification for an exact 1us delay ? You most probably don't need that accuracy.