We currently try to setup a PCIe communication between an Artix7 and the i.mx6q
The CPU is placed on a Q7 module from MSC that is connected to an eval board with a PCIe 1.0 switch. The FPGA is placed on the PCIe slot of the eval board.
Linux kernel is 3.0.35.Q7_IMX6-13.12.01.
Due to the BAR length limitation of the i.MX6 we decided to add the LogiCore AXI CDMA to the FPGA and let it write to a preallocated memory space within the i.MX6 DDR.
We tested the communication and software with an x86 system (Ubuntu 12.04 with kernel 3.2.0-60) first and then tried to port it 1:1 to the i.MX6 system.
But it seems the communication does not work the same way. Writing to the FPGA configuration registers mapped through BAR0 is working. But if we wanted to write using the CDMA to the DDR of the i.MX6 the data seems to be lost somewhere.
The FPGA initiates the transfer as we can see in chip scope, but the buffer on i.MX6 side is unchanged. Additionally the MSI capability seems to be interpreted differently on the i.MX6 and we need to use them in our final setup.
PC Log of lspci hex dump and config dump
02:00.0 Memory controller: Xilinx Corporation Device 7042
00: ee 10 42 70 07 04 10 00 00 00 80 05 10 00 00 00
10: 00 00 cf df 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 00 00 00
40: 01 48 23 00 08 00 00 00 05 60 85 00 0c 30 e0 fe
50: 00 00 00 00 b9 41 00 00 00 00 00 00 00 00 00 00
60: 10 00 02 00 29 80 28 00 16 29 00 00 12 f4 03 00
70: 40 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
02:00.0 Memory controller: Xilinx Corporation Device 7042
Subsystem: Xilinx Corporation Device 0007
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 47
Region 0: Memory at dfcf0000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] MSI: Enable+ Count=1/4 Maskable- 64bit+
Address: 00000000fee0300c Data: 41b9
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 1, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB
Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: pciDriver
i.MX6 dump:
03:00.0 Memory controller: Xilinx Corporation Device 7042
00: ee 10 42 70 46 05 10 00 00 00 80 05 08 00 00 00
10: 00 00 10 01 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00
40: 01 48 23 00 08 00 00 00 05 60 85 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 10 00 02 00 29 80 64 00 10 28 00 00 12 f4 03 00
70: 00 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
03:00.0 Memory controller: Xilinx Corporation Device 7042
Subsystem: Xilinx Corporation Device 0007
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin ? routed to IRQ 502
Region 0: Memory at 01100000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] MSI: Enable+ Count=1/4 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 1, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: pciDriver
Kernel modules: pciDriver
Is there anything obvious we have to care about to get PCIe busmaster communication running?
Maybe something we have to check regarding the iATU settings because of the PCIe switch on our evaluation board?
Volker
We have not got your response yet and will close the discussion in 3 days. If you still need help, please feel free to reply with an update to this discussion.
Thanks,
Yixing
Volker
Had your issue got resolved? If yes, we are going to close the discussion in 3 days. If you still need help, please feel free to reply with an update to this discussion.
Thanks,
Yixing
We should better avoid the dma address remapped by ATU. Do you have the TLP snapshot of pcie protocol analysis to help address the issue?
Sorry for the late reply, I wasn't notified about responses to my post.
We finally got the test running. It turned out to be a problem when mapping between kernel space and user space (ARM seems to be a bit different here than x86: dma_mmap_coherent vs remap_pfn_range).
Unfortunately the data transfer bandwidth was not as good as we would have expected on the i.MX6. We generated a data stream within the FPGA using a clock counter. By subtracting the last transferred data word from the first we calculated the data transfer speed, so calculation should not depend on any CPU latencies. On the i.MX6 we got 270MB/s while we got ~400MB/s on an x86 system with one PCIe 2.0 lane.
Is there any obvious reason (apart from the maximum payload size of 128byte on the i.MX6) that the transfer rate is only that low? As the maximum payload size on the FPGA is currently limited to 256 bytes anyway it shouldn't have a very big effect in comparison to the x86 system.
We are also looking to connect the i.MX6 to an Artix7 through PCIe. In the posts above it looks like you got this to work with ASPM disabled:
"LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+"
It this the case? Have you been able to get PCIe working between the iMX6 and Artix7 with ASPM enabled? We would like to use the power management modes of ASPM but I have not seen (conclusively) that PCIe ASPM works reliably on the iMX6 working as a RC.
Thanks,
Tom
As we currently don't care about power consumption, we haven't looked at this yet. Our biggest concern is data transfer bandwidth.