* Two i.MX6Q SD boards, one is used as PCIe RC; the other one is used as PCIe EP. Connected by 2*mini_PCIe to standard_PCIe adaptors, 2*PEX cable adaptors, and one PCIe cable.
* Set-up link between RC and EP by their stand-alone 125MHz running internally.
* In EP's system, EP can access the reserved ddr memory (default address:0x40000000) of PCIe RC's system, by the interconnection between PCIe EP and PCIe RC.
Use mem=768M in the kernel command line to reserve the 0x4000_0000 ~ 0x4FFF_FFFF DDR memory space used to do the EP access tests.
(The example of the RC’s cmd-line: Kernel command line: noinitrd console=ttymxc0,115200 mem=768M root=/dev/nfs nfsroot=10.192.225.216:/home/r65037/nfs/rootfs_mx5x_10.11,v3,tcp ip=dhcp rw)
| ARM core used as the bus master, and cache is disabled | ARM core used as the bus master, and cache is enabled | IPU used as the bus master(DMA) |
| Data size in one write tlp | 8 bytes | 32 bytes | 64 bytes |
| Write speed | ~109MB/s | ~298MB/s | ~344MB/s |
| Data size in one read tlp | 32 bytes | 64 bytes | 64 bytes |
| Read speed | ~29MB/s | ~100MB/s | ~211MB/s |
IPU used as the bus master(DMA)
Here is the summary of the PCIe throughput results tested by IPU.
Write speed is about 344 MB/s.
Read speed is about 211MB/s
ARM core used as the bus master (define EP_SELF_IO_TEST in pcie.c driver)
write speed ~300MB/s.
read speed ~100MB/s.
Cache is enabled.
PCIe EP: Starting data transfer...
PCIe EP: Data transfer is successful, tv_count1 54840us, tv_count2 162814us.
PCIe EP: Data write speed is 298MB/s.
PCIe EP: Data read speed is 100MB/s.
Regarding to the log, the data size of each TLP when cache is enabled, is about 4 times of the data size in write, and 2 times of the data size in read, when the cache is not enabled.
| Cache is disabled | Cache is enabled |
| Data size in one write tlp | 8 bytes | 32 bytes |
| Write speed | ~109MB/s | ~298MB/s |
| Data size in one read tlp | 32 bytes | 64 bytes |
| Read speed | ~29MB/s | ~100MB/s |
Cache is not enabled
PCIe EP: Starting data transfer...
PCIe EP: Data transfer is successful, tv_count1 149616us, tv_count2 552099us.
PCIe EP: Data write speed is 109MB/s.
PCIe EP: Data read speed is 29MB/s.
One simple method used to connect the imx6 pcie ep and rc
View of the whole solution:

HW materials:
2* iMX6Q SD boards, 2* Mini PCIe to STD PCIe adaptors, one SATA2 data cable.
the mini-pcie to standard pcie exchange adaptor.
Here is the URL:
http://www.bplus.com.tw/Adapter/PM2C.html

How to make it.
signals connections
Two adaptors, one is named as A, the other one is named as B.
A B
TXM <----> RXM
TXN <----> RXN
RXM <----> TXM
RXN <----> TXN
A1 connected to B3
A2 connected to B4
A3 connected to B1
A4 connected to B2

Connect the cable to the adaptor.
Connect the SATA2 data cable to Mini PCIe to STD PCIe adaptor (A)

Connect the SATA2 data cable to Mini PCIe to STD PCIe adaptor (B)

NOTE:
* Please keep length of Cable as short as possible. Our cable is about 12cm.
* Please connect shield wire in SATA2 Cable to GND at both board.
* Please boot up PCIe EP system before booting PCIe RC system.
Base one imx_3.0.35 mainline, the patch, and the IPU test tools had been attached.
NOTE:
* IPU tests usage howto.
Unzip the xxx.zip, and run xxx_r.sh to do read tests, run xxx_w.sh to do the write tests.
Tests log:
EP:
root@freescale ~/pcie_ep_io_test$ ./pcie-r.sh
pass cmdline 14, ./pcie_ipudev_test.out
new option : c
frame count set 1
new option : l
loop count set 1
new option : i
input w=1024,h=1024,fucc=RGB4,cpx=0,cpy=0,cpw=0,cph=0,de=0,dm=0
new option : O
640,480,RGB4,0,0,0,0,0
new option : s
show to fb 0
new option : f
output file name ipu1-1st-ovfb
new option : ÿ
show_to_buf:0, input_paddr:0x1000000, output.paddr0x18800000
====== ipu task ======
input:
foramt: 0x34424752
width: 1024
height: 1024
crop.w = 1024
crop.h = 1024
crop.pos.x = 0
crop.pos.y = 0
output:
foramt: 0x34424752
width: 640
height: 480
roate: 0
crop.w = 640
crop.h = 480
crop.pos.x = 0
crop.pos.y = 0
total frame count 1 avg frame time 19019 us, fps 52.579000
root@freescale ~/pcie_ep_io_test$ ./pcie-w.sh
pass cmdline 14, ./pcie_ipudev_test.out
new option : c
frame count set 1
new option : l
loop count set 1
new option : i
input w=640,h=480,fucc=RGB4,cpx=0,cpy=0,cpw=0,cph=0,de=0,dm=0
new option : O
1024,1024,RGB4,0,0,0,0,0
new option : s
show to fb 1
new option : f
output file name ipu1-1st-ovfb
new option : ÿ
show_to_buf:1, input_paddr:0x18a00000, output.paddr0x1000000
====== ipu task ======
input:
foramt: 0x34424752
width: 640
height: 480
crop.w = 640
crop.h = 480
crop.pos.x = 0
crop.pos.y = 0
output:
foramt: 0x34424752
width: 1024
height: 1024
roate: 0
crop.w = 1024
crop.h = 1024
crop.pos.x = 0
crop.pos.y = 0
total frame count 1 avg frame time 11751 us, fps 85.099140
root@freescale ~$ ./memtool -32 01000000=deadbeaf
Writing 32-bit value 0xDEADBEAF to address 0x01000000
RC:
Before run "./memtool -32 01000000=deadbeaf" at EP.
root@freescale ~$ ./memtool -32 40000000 10
Reading 0x10 count starting at address 0x40000000
0x40000000: 00000000 00000000 00000000 00000000
0x40000010: 00000000 00000000 00000000 00000000
0x40000020: 00000000 00000000 00000000 00000000
0x40000030: 00000000 00000000 00000000 00000000
After run "./memtool -32 01000000=deadbeaf" at EP.
root@freescale ~$ ./memtool -32 40000000 10
Reading 0x10 count starting at address 0x40000000
0x40000000: DEADBEAF 00000000 00000000 00000000
0x40000010: 00000000 00000000 00000000 00000000
0x40000020: 00000000 00000000 00000000 00000000
0x40000030: 00000000 00000000 00000000 00000000