The OP is probably running under the debugger, or they wouldn't know that it was getting that exception. They would need a good software system to be able to trap and report that otherwise.
I would suggest that they add a breakpoint to the Illegal Instruction routine, and then inspect/dump/copy the stack frame. That will give the program counter of where the illegal instruction was fetched from. This will either be an address not pointing to real code, or it will. If the latter, and the instruction at that address now reads properly, then the initial read of that data failed.
It looks to be a clocking problem of some sort.
Is this happening to one unit only or all new production?
Can they change the code to wait a lot longer after the device claims it has PLL Sync before switching the clock? If that fixes it then there's most likely to be a stability problem with the crystal or the PLL.
Tom