The i.MX6Q PCIe EP/RC Validation and Throughput page seems misleading when it refers to "cache is enabled" or "cache is not enabled". In all cases, both the L1 & L2 caches are enabled. I believe the difference between the two is how the iATU region in the i.MX6 address space is mapped by the MMU. Specifically for Linux, that would mean either using ioremap() which maps the region as Device memory ("cache is not enabled") or using ioremap_cache() which maps the region as Cacheable ("cache is enabled"). I was able to replicate the performance results by switching between these in some test code I wrote.
Unfortunately, the cacheable approach does create coherency issues you will need to handle on your own (cache flushes and invalidates). The endpoint driver code in pcie-imx6.c does not include that. I have not tried to address this in my test code.
-Carl