OK - I'm able to now answer a portion of my question.
Two threads on the same core will NOT cause an issue with accessing PCIe BAR as long as the BAR is mapped as "Device Memory." Device Memory is uncached and very similar to Strongly Ordered memory; it also guarantees atomic operation (interrupts will wait until the access operation is complete). See "ARM Cortex-A Series: Programmer's Guide v4.0", specifically "10: Memory Ordering". ldr/str/ldm/stm instructions fall under this atomic nature.
What I need NXP to answer is if two threads on different cores are doing ldr/str/ldm/stm to the PCIe BAR, what (if any) arbitrates the accesses in HW? Caches aren't involved (Dev Mem)... Is there something in the "MPCore Platform" or AXI/AHB? Obviously if the answer is "nothing" than SW is the mechanism.
I'm primarily concerned that if 2+ cores are performing a PCIe BAR read they can wreak havoc (eg. i2c read consists of write+read and if not atomic with competing threads you're almost guaranteed to read back incorrect data).
Thanks in advance!