> Do you have any thoughts as to what would be preventing the crystal from oscillating?
Plenty. Let's go back to the beginning.
MARGIN TEST
I originally suggested that you MARGIN TEST the crystal. I don't think you've done that. Basically, if the margin is wrong then nothing you can do will be reliable if you don't fix this. I'd like to get my original suggestions out of the way before going on to anything else.
The test is very simple. The Data Sheet for the crystal says the ESR is "60 ohms max". To "margin test" with a margin of "3", replace R1 (the 10 ohm resistor) with 180 ohms. Will it work now? You may need to switch it on and off A LOT, but if it runs at all, it is OK. If it won't run, it has failed the "margin test". Keep reducing the resistor (150, 100, 80, 50 etc) to see what the margin is. If it fails at 180, change the capacitors. Try a SMALLER one on the input to the CPU (EXTAL) as that should increase the gain of the loop.
> but according to what you noted, the 18pF caps and 7pF pin impedance in series should give a load capacitance of ~4pF, correct?
No. Two 18pF in series gives 9pF. Two 7pF in series gives 3.5pF. Those two pairs in parallel give the sum, or 12.5pF. Then R1 and R2 and the pads and tracks on them add more capacitance, as do the crystal pads and all the tracks. You should measure the track capacitance on an unloaded board with a professional meter or bridge.
OSCILLATOR
Now try to find out what the oscillator is doing when it is working, when it isn't, and look for differences.
See if you can find a high VOLTAGE oscilloscope probe. One of the ones that goes to 1000V and is 100:1 ratio. The reason to do this is that the normal probes may have a 10MOhm input impedance, but they have pretty high capacitance that will affect the crystal more than you'd want. Otherwise, measure the crystal pins with a normal probe, but use a 1k resistor in series with the probe. You can work out from the frequency and the probe capacitance specs what it will do to the signal. Anyway, I'm more interested in the average voltage on the pins.
So when it is working properly, the average voltage on XTAL and EXTAL should be "in the middle somewhere". When it isn't working - well measure it, think about it and tell me. Either it is trying to oscillate and can't get going, or the oscillator is disabled because the CPU is in the wrong mode somehow.
When it isn't working, see if you can kick it into life by pulling EXTAL up and down a bit by shorting to VDD and VSS via some resistors. That should show if the oscillator is disabled. If that works, try reducing the 1M resistor. That is there to try and get the amplifier "DC Balanced". Theoretically, simple crystal oscillators can't start until there's some NOISE to amplify, that the crystal then filters to its operating frequency. Maybe your power supplies and startup and "too quiet" (very unlikely, this is only theory).
When it is locked up can you get it running by feeding an external oscillating signal into EXTAL?
RESET
Are you relying on the internal power-on-reset circuit or do you have an external one? Add an external one, or manually control RSTIN and see if that changes anything. See if a locked-up unit will start running if you cycle RSTIN. Do you have a debug connector on the unit? It should have RSTIN connected to it. Do you have an external pullup on RSTIN (as recommended)? Is RSTIN at a different voltage when it is locked up? What is RSTOUT doing when it is locked up?
What's the voltage on the TEST pin (working and non-working)? Do you have a pull-down on it? Add one. Ditto JTAG_EN and TRST (measure and pull).
POWER CONNECTIONS
Now if we've got the crystal and reset factors out of the way, time to check power. This chip only had one power supply (some of the other ones have lots more that need to be sequenced properly).
Do you have all the VSS and VDD pins connected? And VDDPLL and VSSPLL? And VDDA, VSSA, VRH, VRL?
How much bypass do you have on the boards? What's the difference in power supplies between your production testing, bench testing and as the customers use them?
POWER SEQUENCING
What supplies the 3.3V? Are there any OTHER supplies on the board, supplying 5V or 12V to anything? Measure their power-on sequences relative to each other. Which one comes up first? Is there any "leakage" between power domains - meaning do you have any 5V-driven signals connecting to any CPU pins where the 5V might come up and start driving a CPU pin before its 3.3V is up?
What is the difference in the sequencing between a "working power on", a "non working power on" and "the second one that works"? Add BIG capacitors to different supplies (to slow them down) and see if it gets better or worse.
Can any circuitry drive CPU I/O pins above 3.3V? This is called "Injection Current" and can cause lockups. Read the Data Sheet and search for "injection" to get the figures, but these only apply when the CPU is powered.
How FAST is the power supply? Is it a switcher or linear? Does it have "slow start". Is it Automotive (powered by a switch from a 12V car battery, so it comes up in microseconds), a bench supply that comes up fast or one that comes up slow? Is it a linear-transformer "Wall Wart" that turns on slowly sometimes and faster other times, depending on when in the mains cycle it was switched on?
Is it powered from USB?
Experiment with slow and fast power-on sequencing and ramping. Add series resistors and large electrolytics so the power comes up slowly and see if it makes any difference. Power it from a bench supply with a switch so it turns on fast.
You know a fast off/on has it work. Measure VDD with an oscilloscope and see how low it has to go to NOT recover.
Let me know the results of all of the above tests. With luck, you'll find the problem and fix it half way through the above list.
Tom